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Description 
METHODS AND COMPOSITIONS FOR 
EFFICIENT NUCLEIC ACID SEQUENCING 

5 The present application is a continuation-in-part of 

co-pending U.S. Patent Application Serial No. 08/3 03,058, 
filed September 08, 1994; which is a continuation-in-part 
of U.S. Patent Application Serial No. 08/127,420, filed 
September 27, 1993; the entire text and figures of which 

10 disclosures are specifically incorporated herein by 

reference without disclaimer. The U.S. Government owns 
rights in the present invention pursuant to Department of 
Energy grant LDRD 03235 and Contract No. W-31-109-ENG-38 
between the U.S. Department of Energy and The University 

15 of Chicago, representing Argonne National Laboratory. 

BACKGROUND OF THE INVENTION 
1 . Field of the Invention 

The present invention generally relates to the field 
of molecular biology. The invention particularly 
provides novel methods and compositions to enable highly 
efficient sequencing of nucleic acid molecules. The 
methods of the invention are suitable for sequencing long 
nucleic acid molecules, including chromosomes and RNA, 
without cloning or subcloning steps. 

2 . Description of the Related Art 

30 

Nucleic acid sequencing forms an integral part of 
scientific progress today. Determining the sequence, 
i.e. the primary structure, of nucleic acid molecules and 
segments is important in regard to individual projects 
35 investigating a range of particular target areas. 

Information gained from sequencing impacts science, 
medicine, agriculture and all areas of. biotechnology. 
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Nucleic acid sequencing is, of course, vital to the human 
genome project and other large-scale undertakings, the 
aim of which is to further our understanding of evolution 
and the function of organisms and to provide an insight 
5 into the causes of various disease states. 

The utility of nucleic acid sequencing is evident, 
for example, the Human Genome Project (HGP) , a 
multinational effort devoted to sequencing the entire 
10 human genome, is in progress at various centers. 

However, progress in this area is generally both slow and 
costly. Nucleic acid sequencing is usually determined on 
polyacryl amide gels that separate DNA fragments in the 
range of 1 to 500 bp, differing in length by one 
15 nucleotide. The actual determination of the sequence, 
i.e., the order of the individual A, G, C and T 
nucleotides may be achieved in two ways. Firstly, using 
the Maxam and Gilbert method of chemically degrading the 
DNA fragment at specific nucleotides (Maxam & Gilbert, 
20 1977) , or secondly, using the dideoxy chain termination 
sequencing method described by Sanger and colleagues 
(Sanger et al . , 1977). Both methods are time-consuming 
and laborious. 

More recently, other methods of nucleic acid 
sequencing have been proposed that do not employ an 
electrophoresis step, these methods may be collectively 
termed Sequencing By Hybridization or SBH (Drmanac et 
al., 1991; Cantor et al . , 1992; Drmanac & Crkvenjakov, 
U.S. Patent 5,202,231). Development of certain of these 
methods has given rise to new solid support type 
sequencing tools known as sequencing chips. The utility 
of SBH in general is evidenced by the fact that U.S. 
Patents have been granted on this technology. However, 
although SBH has the potential for increasing the speed 
with which nucleic acids can be sequenced, all current 
SBH methods still suffer from several drawbacks. 
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SBH can be conducted in two basic ways, often 
referred to as Format 1 and Format 2 (Cantor et al . , 
1992) . In Format 1, oligonucleotides of unknown . 
sequence, generally of about 100-1000 nucleotides in 
5 length, are arrayed on a solid support or filter so that 
the unknown samples themselves are immobilized (Strezoska 
et al . , 1991; Drmanac & Crkvenjakov, U.S. Patent 
5,202,231). Replicas of the array are then interrogated 
by hybridization with sets of labeled probes of about 6 

10 to 8 residues in length. In Format 2, a sequencing chip 
is formed from an array of oligonucleotides with known 
sequences of about 6 to 8 residues in length (Southern, 
WO 89/10977; Khrapko et al., 1991; Southern et al . , 
1992) . The nucleic acids of unknown sequence are then 

15 labeled and allowed to hybridize to the immobilized 
oligos . 

Unfortunately, both of these SBH formats have 
several limitations, particularly the requirement for 

20 prior DNA cloning steps. In Format 1, other significant 
problems include attaching the various nucleic acid 
pieces to be sequenced to the solid surface support or 
preparing a large set of longer probes. In Format 2, 
major problems include labelling the nucleic acids of 

25 unknown sequence, high noise to signal ratios that 

generally result, and the fact that only short sequences 
can be determined. Further problems of Format 2 include 
the secondary structure formation that prevents access to 
some targets and the different conditions that are 

30 necessary for probes with different GC contents. 

Therefore, the art would clearly benefit from a new 
procedure for nucleic acid sequencing, and particularly, 
one that avoids the tedious processes of cloning and/or 
subcloning. 
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SUMMARY OF THE INVENTION 

The present invention seeks to overcome these and 
other drawbacks inherent in the prior art by providing 
5 new methods and compositions for the sequencing of 

nucleic acids. The novel techniques described herein 
have been generally termed Format 3 by the inventors and 
represent marked improvements over the existing Format 1 
and Format 2 SBH methods . In the Format 3 sequencing 

10 provided by the invention, nucleic acid sequences are 
determined by means of hybridization with two sets of 
small oligonucleotide probes of known sequences. The 
methods of the invention allow high discriminatory 
sequencing of extremely large nucleic acid molecules, 

15 including chromosomal material or RNA, without prior 

cloning, subcloning or amplification. Furthermore, the 
present methods do not require large numbers of probes, 
the complex synthesis of longer probes, or the labelling 
of a complex mixture of nucleic acids segments. 

20 

To determine the sequence of a nucleic acid 
according to the methods of the present invention, one 
would generally identify sequences from the nucleic acid 
by hybridizing with complementary sequences from two sets 

25 of small oligonucleotide probes {oligos) of defined 

length and known sequence, which cover most combinations 
of sequences for that length of probe. One would then 
analyze the sequences identified to determine stretches 
of the identified sequences that overlap, and reconstruct 

3 0 or assemble the complete nucleic acid sequence from such 
overlapping sequences . 

The sequencing methods may be conducted using 
sequential hybridization with complementary sequences 
35 from the two sets of small oligos. Alternatively, a mode 
described as "cycling" may be employed, in which the two 
sets of small oligos are hybridized with the unknown 
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sequences simultaneously. The term "cycling" is applied 
as the discriminatory part of the technique comes from 
then increasing the temperature to "melt" those hybrids 
that are non- complementary . Such cycling techniques are 
5 commonly employed in other areas of molecular biology, 
such as PCR, and will be readily understood by those of 
skill in the art in light when reading the present 
disclosure . 

\ 

10 The invention is applicable to sequencing nucleic 

acid molecules of very long length. As a practical 
matter, the nucleic acid molecule to be sequenced will 
generally be fragmented to provide small or intermediate 
length nucleic acid fragments that may be readily 

15 manipulated. The term nucleic acid fragment, as used 

herein, most generally means a nucleic acid molecule of 
between about 10 base pairs (bp) and about 100 bp in 
length. The most preferred methods of the invention are 
contemplated to be those in which the nucleic acid 

20 molecule to be sequenced is treated to provide nucleic 
acid fragments of intermediate length, i.e., of between 
about 10 bp and about 40 bp. However, it should be 
stressed that the present invention is not a method of 
completely sequencing small nucleic acid fragments, 

25 rather it is a method of sequencing nucleic acid 

molecules per se, which involves determining portions of 
sequence from within the molecule - whether this is done 
using the whole molecule, or for simplicity, whether this 
is achieved by first fragmenting the molecule into 

30 smaller sized sections of from about 4 to about 1000 
bases. 

Sequences from nucleic acid molecules are determined 
by hybridizing to small oligonucleotide probes of known 
35 sequence. In referring to "small oligonucleotide 

probes", the term "small" means probes of less than 10 bp 
in length, and preferably, probes of between about 4 bp 
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and about 9 bp in length. In one exemplary sequencing 
embodiment, probes of about 6 bp in length are 
contemplated to be particularly useful. For the sets of 
oligos to cover all combinations of sequences for the 
5 length of probe chosen, their number will be represented 
by 4 F , wherein F is the length of the probe. For 
example, for a 4-mer, the set would contain 256 probes; 
for a 5-mer, the set would contain 1024 probes; for a 6- 
mer, 4096 probes; a 7-mer, 16384 probes; and the like. 
10 The synthesis of oligos of this length is very routine in 
the art and may be achieved by automated synthesis. 

In the methods of the invention, one set of the 
small oligonucleotide probes of known sequence, which may 

15 be termed the first set, will be attached to a solid 

support, i.e., immobilized on that support in such a way 
so that they are available to take part in hybridization 
reactions. The other set of small oligonucleotide probes 
of known sequence, which may be termed the second set, 

20 will be probes that are in solution and that are labelled 
with a detectable label. The sets of oligos may include 
probes of the same or different lengths. 

The process of sequential hybridization means that 
25 nucleic acid molecules, or fragments, of unknown sequence 
can be hybridized to the distinct sets of oligonucleotide 
probes of known sequences at separate times {FIG. 1) . 
The nucleic acid molecules or fragments will generally be 
denatured, allowing hybridization, and added to the 
30 first, immobilized set of probes under discriminating 
hybridization conditions to ensure that only fragments 
with complementary sequences hybridize. Fragments with 
non- complementary sequences are removed and the next 
round of discriminating hybridization is then conducted 
35 by adding the second, labelled set of probes, in 

solution, to the combination of fragments and probes 
already formed. Labelled probes that hybridize adjacent 
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to a fixed probe will remain attached to the support and 
can be detected, which is not the case when there is 
space between the fixed and labelled probes (FIG. .1) . 

5 The process of simultaneous hybridization means that 

the unknown sequence nucleic acid molecules can be 
contacted with the distinct sets of oligonucleotide 
probes of known sequences at the same time . 
Hybridization will occur under discriminating 

10 hybridization conditions. Fragments with non- 
complementary sequences are then "melted", i.e., removed 
by increasing the temperature, and the next round of 
discriminating hybridization is then conducted, allowing 
any complementary second probes to hybridize. Labelled 

15 probes that hybridize adjacent to a fixed probe will then 
be detected in the same manner. 

Nucleic acid sequences that are "complementary" are 
those that are capable of base -pairing according to the 

20 standard Watson-Crick complementarity rules, and 

variations of the rules as they apply to modified bases. 
That is, that the larger purines, or modified purines, 
will always base pair with the smaller pyrimidines to 
form only known combinations. These include the standard 

25 paris of guanine paired with Cytosine (G:C) and Adenine 
paired with either Thymine (A:T) , in the case of DNA, or 
Adenine paired with Uracil (A:U) in the case of RNA. The 
use of modified bases, or the so-called Universal Base 
(M, Nichols et al . , 1994) is also contemplated. 

30 

As used herein, the term "complementary sequences" 
means nucleic acid sequences that are substantially 
complementary over their entire length and have very few 
base mismatches. For example, nucleic acid sequences of 
35 six bases in length may be termed complementary when they 
hybridize at five out of six positions with only a single 
mismatch. Naturally, nucleic acid sequences that are 
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"completely complementary " will be nucleic acid sequences 
that are entirely complementary throughout their entire 
length and have no base mismatches . 

5 After identifying, by hybridization to the oligos of 

known sequence, various individual sequences that are 
part of the nucleic acid fragments, these individual 
sequences are next analyzed to identify stretches of 
sequences that overlap. For example, portions of 

10 sequences in which the 5' end is the same as the 3' end 
of another sequence, or vice versa, are identified. The 
complete sequence of the nucleic acid molecule or 
fragment can then be delineated, i.e., it can be 
reconstructed from the overlapping sequences thus 

15 determined. 

The processes of identifying overlapping sequences 
and reconstructing the complete sequence will generally 
be achieved by computational analysis. For example, if a 

20 labelled probe 5'-TTTTTT-3' hybridizes to the spot 
containing the fixed probe 5 ' -AAAAAA-3 ' , a 12-mer 
sequence from within the nucleic acid molecule is 
defined, namely 5 ' -AAAAAATTTTTT-3 ' (SEQ ID NO:l), i.e. 
the sequence of the two hybridized probes is combined to 

25 reveal a previously unknown sequence. The next question 
to be answered is which nucleotide follows next after the 
newly determined 5 ' AAAAAATTTTTT-3 ' (SEQ ID NO:l) 
sequence. There are four possibilities represented by 
the fixed probe 5'-AAAAAT-3' and labelled probes 

30 5 ' -TTTTTA-3 ' for A; 5 ' -TTTTTT-3 ' for T; 5'-TTTTTC-3' for 
C; and 5' -TTTTTG-3 ' for G. If, for example, the probe 
5 ' -TTTTTC-3 ' is positive and the other three are 
negative, then the assembled sequence is extended to 
5 ' -AAAAAATTTTTTC-3 ' (SEQ ID NO:2). In the next step, an 

35 algorithm determines which of the labelled probes TTTTCA, 
TTTTCT, TTTTCC or TTTTCG are positive at the spot 
containing the fixed probe AAAATT. The process is 
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repeated until all positive (F + P) oligonucleotide 
sequences are used or defined as false positives. 

The present invention thus provides a very effective 
5 way to sequence nucleic acid fragments and molecules of 
long length. Large nucleic acid molecules, as defined 
herein, are those molecules that need to be fragmented 
prior to sequencing. They will generally be of at least 
about 45 or 50 base pairs (bp) in length, and will most 

10 often be longer. In fact, the methods of the invention 
may be used to sequence nucleic acid molecules with 
virtually no upper limit on length, so that sequences of 
about 100 bp, 1 kilobase (kb) , 100 kb, 1 megabase (Mb) , 
and 50 Mb or more may be sequenced, up to and including 

15 complete chromosomes, such as human chromosomes, which 

are about 100 Mb in length. Such a large number is well 
within the scope of the present invention and sequencing 
this number of bases will require two sets of 8-mers or 
9-mers (so that F + P « 16-18). The nucleic acids to be 

20 sequenced may be DNA, such as cDNA, genomic DNA, 

microdissected chromosome bands, cosmid DNA or YAC 
inserts, or may be RNA, including mRNA, rRNA, tRNA or 
snRNA. 

25 The process of determining the sequence of a long 

nucleic acid molecule involves simply identifying 
sequences of length F + P from the molecule and combining 
the sequences using a suitable algorithm. In practical 
terms, one would most likely first fragment the nucleic 

30 acid molecule to be sequenced to produce smaller 

fragments, such as intermediate length nucleic acid 
fragments. One would then identify sequences of length 
F + P by hybridizing, e.g., sequentially hybridizing, the 
fragments to complementary sequences from the two sets of 

35 small oligonucleotide probes of known sequence, as 

described above. In this manner, the complete nucleic 
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acid sequence of extremely large molecules can be 
reconstructed from overlapping sequences of length F + P. 

Whether the nucleic acid to be sequenced is itself 
5 an intermediate length fragment or is first treated to 
generate such length fragments, the process of 
identifying sequences from such nucleic acid fragments by 
hybridizing to two sets of small oligonucleotide probes 
of known sequence is central to the sequencing methods 
10 disclosed herein. This process generally comprises the 
following steps: 

(a) contacting the set or array of attached or 
immobilized oligonucleotide probes with the 

15 nucleic acid fragments under hybridization 

conditions effective to allow fragments with a 
complementary sequence to hybridize 
sufficiently to a probe, thereby forming 
primary complexes wherein the fragment has both 

20 hybridized and non-hybridized, or "free" , 

sequences; 

(b) contacting the primary complexes with the set 
of labelled oligonucleotide probes in solution 

25 under hybridization conditions effective to 

allow probes with complementary sequences to 
hybridize to a non-hybridized or free fragment 
sequence, thereby forming secondary complexes 
wherein the fragment is hybridized to both an 

30 attached (immobilized) probe and a labelled 

probe ; 

(c) removing from the secondary complexes any 
labelled probes that have not hybridized 

35 adjacent to an attached probe, thereby leaving 

only adjacent secondary complexes; 
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(d) detecting the adjacent secondary complexes by 
detecting the presence of the label in the 
labelled probe; and 



5 (e) identifying oligonucleotide sequences from the 

nucleic acid fragments in the adjacent 
secondary complexes by combining or connecting 
the known sequences of the hybridized attached 
and labelled probes . 

10 

The hybridization or 'washing conditions' chosen to 
conduct either one, or both, of the hybridization steps 
may be manipulated according to the particular sequencing 
embodiment chosen. For example, both of the 

15 hybridization conditions may be designed to allow 

oligonucleotide probes to hybridize to a given nucleic 
acid fragment when they contain complementary sequences, 
i.e., substantially matching sequences, such as those 
sequences that hybridize at five out of six positions. 

20 The hybridization steps would preferably be conducted 
using a simple robotic device as is routinely used in 
current sequencing procedures. 



Alternatively, the hybridization conditions may be 
25 designed to allow only those oligonucleotide probes and 

fragments that have completely complementary sequences to 
hybridize. These more discriminating or % stringent' 
conditions may be used for both distinct steps of a 
sequential hybridization process or for either step 
30 alone. In such cases, the oligonucleotide probes, 

whether immobilized or labelled probes, would only be 
allowed to hybridize to a given nucleic acid fragment 
when they shared completely complementary sequences with 
the fragment. 



The hybridization conditions chosen will generally 
dictate the degree of complexity required to analyze the 
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data obtained. Equally, the computer programs available 
to analyze any data generated may dictate the 
hybridization conditions that must be employed in. a given 
laboratory. For example, in the most discriminating 
5 process, both hybridization steps would be conducted 
under conditions that allow only oligos and fragments 
with completely complementary sequences to hybridize. As 
there will be no mismatched bases, this method involves 
the least complex computational analyses and, for this 
10 reason, it is the currently preferred method for 

practicing the invention. However, the use of less 
discriminating conditions for one or both hybridization 
steps also falls within the scope of the present 
invention. 

15 

Suitable hybridization conditions for use in either 
or both steps may be routinely determined by optimization 
procedures or 'pilot studies' . Various types of pilot 
studies are routinely conducted by those skilled in the 

20 art of nucleic acid sequencing in establishing working 

procedures and in adapting a procedure for use in a given 
laboratory. For example, conditions such as the 
temperature; the concentration of each of the components; 
the length of time of the steps; the buffers used and 

25 their pH and ionic strength may be varied and thereby 
optimized. 

In preferred embodiments, the nucleic acid 
sequencing method of the invention involves a 

30 discriminating step to select for secondary hybridization 
complexes that include immediately adjacent immobilized 
and labelled probes, as distinct from those that are not 
immediately adjacent and are separated by one, two or 
more bases. A variety of processes are available for 

35 removing labelled probes that are not hybridized 

immediately adjacent to an attached probe, i.e., not 
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hybridized back to back, each of which leaves only the 
immediately adjacent secondary complexes. 

Such discriminatory processes may rely solely on 
5 washing steps of controlled stringency wherein the 

hybridization conditions employed are designed so that 
immediately adjacently probes remain hybridized due to 
the increased stability afforded by the stacking 
interactions of the adjacent nucleotides. Again, washing 
10 conditions such as temperature, concentration, time, 

buffers, pH, ionic strength and the like, may be varied 
to optimize the removal of labelled probes that are not 
immediately adjacent. 

15 In preferred embodiments the immediately adjacent 

immobilized and labelled probes would be ligated, i.e., 
covalently joined, prior to performing washing steps to 
remove any non-ligated probes. Ligation may be achieved 
by treating with a solution containing a chemical 

20 ligating agent, such as, e.g., water-soluble carbodiimide 
or cyanogen bromide. More preferably, a ligase enzyme, 
such as T 4 DNA ligase from T 4 bacteriophage, which is 
commercially available from many sources (e.g., Biolabs) , 
may be employed. In any event, one would then be able to 

25 remove non- immediately adjacent labelled probes by more 
stringent washing conditions that cannot affect the 
covalently connected labeled and fixed probes. 

The remaining adjacent secondary complexes would be 
30 detected by observing the location of the label from the 
labelled probes present within the complexes. The 
oligonucleotide probes may be labeled with a chemically- 
detectable label, such as fluorescent dyes, or adequately 
modified to be detected by a chemiluminescent developing 
35 procedure, or radioactive labels such as 35 S, 3 H, 32 p or 
33 P, with 33 P currently being preferred. Probes may also 
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be labeled with non-radioactive isotopes and detected by 
mass spectrometry. 

Currently, the most preferred method contemplated 
5 for practicing the present invention involves performing 
the hybridization steps under conditions designed to 
allow only those oligonucleotide probes and fragments 
that have completely complementary sequences to hybridize 
and that allow only those probes that are immediately 
10 adjacent to remain hybridized. This method subsequently 
requires the least complex computational analysis. 

Where the nucleic acid molecule of unknown sequence 
is longer than about 45 or 50 bp, one effective method 

15 for determining its sequence generally involves treating 
the molecule to generate nucleic acid fragments of 
intermediate length, and determining sequences from the 
fragments. The nucleic acid molecule, whether it be DNA 
or RNA may be fragmented by any one of a variety of 

20 methods including, for example, cutting by restriction 
enzyme digestion, shearing by physical means such as 
ultrasound treatment, by NaOH treatment or by low 
pressure shearing . 

25 In certain embodiments, e.g., involving small 

oligonucleotide probes between about 4 bp and about 9 bp 
in length, one may aim to produce nucleic acid fragments 
of between about 10 bp and about 40 bp in length. 
Naturally, longer length probes would generally be used 

30 in conjunction with sequencing longer length nucleic acid 
fragment, and vice versa. In certain preferred 
embodiments, the small oligonucleotide probes used will 
be about 6 bp in length and the nucleic acid fragments to 
be sequenced will generally be about 20 bp in length. If 

35 desired, fragments may be separated by size to obtain 
those of an appropriate length, e.g., fragments may be 
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run on a gel, such as an agarose gel, and those with 
approximately the desired length may be excised. 

The method for determining the sequence of a nucleic 
5 acid molecule may also be exemplified using the following 
terms. Initially one would randomly fragment an amount 
of the nucleic acid to be sequenced to provide a mixture 
of nucleic acid fragments of length T. One would prepare 
an array of immobilized oligonucleotide probes of known 
10 sequences and length F and a set of labelled 

oligonucleotide probes in solution of known sequences and 
length P, wherein F + P < T and, preferably, wherein 
T « 3F. 

15 One would then contact the array of immobilized 

oligonucleotide probes with the mixture nucleic acid 
fragments under hybridization conditions effective to 
allow the formation of primary complexes with hybridized, 
complementary sequences of length F and non-hybridized 

20 fragment sequences of length T - F. Preferably, the 
hybridized sequences of length F would contain only 
completely complementary sequences . 

The primary complexes would then be contacted with 
25 the set of labelled oligonucleotide probes under 

hybridization conditions effective to allow the formation 
of secondary complexes with hybridized, complementary 
- sequences of length F and adjacent hybridized, 
complementary sequences of length P. In preferred 
30 embodiments, only those labelled probes with completely 

complementary sequences would be allowed to hybridize and 
only those probes that hybridize immediately adjacent to 
an immobilized probe would be allowed to remain 
hybridized. In the most-pref erred embodiments, the 
35 adjacent immobilized and labelled oligonucleotide probes 
would also be ligated at this stage. 
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Next one would detect the secondary complexes by 
detecting the presence of the label and identify 
sequences of length F + P from the nucleic acid fragments 
in the secondary complexes by combining the known 
5 sequences of the hybridized immobilized and labelled 

probes. Stretches of the sequences of length F + P that 
overlap would then be identified, thereby allowing the 
complete nucleic acid sequence of the molecule to be 
reconstructed or assembled from the overlapping sequences 
10 determined. 

In the methods of the invention, the 
oligonucleotides of the first set may be attached to a 
solid support, i.e. immobilized, by any of the methods 

15 known to those of skill in the art. For example, 
attachment may be via addressable laser-activated 
photodeprotection (Fodor et al . , 1991; Pease et al . , 
1994) . One generally preferred method is to attach the 
oligos through the phosphate group using reagents such as 

20 nucleoside phosphoramidite or nucleoside hydrogen 

phosphorate, as described by Southern & Maskos (PCT 
Patent Application WO 90/03382, incorporated herein by 
reference), and using glass, nylon or teflon supports. 
Another preferred method is that of light -generated 

25 synthesis described by Pease et al . (1994; incorporated 
herein by reference) . One may also purchase support 
bound oligonucleotide arrays, for example, as have been 
offered for sale by Affymetrix and Beckman. 

30 The immobilized oligonucleotides may be formed into 

an array comprising all probes or subsets of probes of a 
given length (preferably about 4 to 10 bases) , and more 
preferably, into multiple arrays of immobilized 
oligonucleotides arranged to form a so-called "sequencing 

35 chip". One example of a chip is that where hydrophobic 
segments are used to create distinct spatial areas. The 
sequencing chips may be designed for different 
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applications like mapping, partial sequencing, sequencing 
of targeted regions for diagnostic purposes, mRNA 
sequencing and large scale genome sequencing.- For each 
application, a specific chip may be designed with 
5 different sized probes or with an incomplete set of 
probes . 

In one exemplary embodiment, both sets of 
oligonucleotide probes would be probes of six bases in 

10 length, i.e., 6-mers. In this instance, each set of 
oligos contains 4096 distinct probes. The first set 
probes is preferably fixed in an array on a microchip, 
most conveniently arranged in 64 rows and 64 columns. 
The second set of 4 096 oligos would be labeled with a 

15 detectable label and dispensed into a set of distinct 
tubes. In this example, 4096 of the chips would be 
combined in a large array, or several arrays. After 
hybridizing the nucleic acid fragments, a small amount of 
the labeled oligonucleotides would be added to each 

20 microchip for the second hybridization step, only one of 
each of the 4096 nucleotides would be added to each 
microchip . 

Further embodiments of the invention include kits 
25 for use in nucleic acid sequencing. Such kits will 

generally comprise a solid support having attached an 
array of oligonucleotide probes of known sequences, as 
shown in FIG. 2A, FIG. 2B and FIG. 2C, wherein the 
oligonucleotides are capable of taking part in 
30 hybridization reactions, and a set of containers 

comprising solutions of labelled oligonucleotide probes 
of known sequences . Arrangements such as those shown in 
FIG. 4 are also contemplated. This depicts the use of 
the Universal Base, either as an attachment method, or at 
35 the terminus to give an added dimension to the 
hybridization of fragments. 
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In the kits, the attached oligonucleotide probes and 
those in solution may be between about 4 bp and about 
9 bp in length, with ones of about 6 bp in length .being 
preferred. The oligos may be labelled with chemically- 
detectable or radioactive labels, with 32 P-labelled 
probes being generally preferred, and 33 P-labelled probes 
being even more preferred. The kits may also comprise a 
chemical or other ligating agent, such as a DNA ligase 
enzyme. A variety of other additional compositions and 
materials may be included in the kits, such as 96 -tip or 
96 -pin devices, buffers, reagents for cutting long 
nucleic acid molecules and tools for the size selection 
of DNA fragments. The kits may even include labelled RNA 
probes so that the probes may be removed by RNAase 
treatment and the sequencing chips re-used. 



BRIEF DESCRIPTION OF THE DRAWINGS 



FIG. 1. Basic steps in the hybridization process. 
Step 1: The unlabelled target DNA to be sequenced <T) is 
hybridized under discriminative conditions to an array of 
attached oligonucleotide probes. Spots with probe Fx and 
Fy are depicted. Complementary sequences for Fx and Fy 
are at different positions of T. Step 2: Labeled probes, 
Pi, (one probe per chip) are hybridized to the array. 
Depicted is a probe that has a complementary target on T 
that is adjacent to the Fx but not to the Fy. Step 3: By 
applying discriminative conditions or reagents, complexes 
with no adjacent probes are selectively melted. A 
particular example is the ligation of a labelled probe to 
a fixed probe, when the labelled probe hybridizes "back 
to back" with the attached probe. Positive signals are 
detected only in the case of" adjacent probes, like Fx and 
Pi, and in a particular example, only in the case of 
ligated probes. 
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FIG.- 2A, FIG. 2B and FIG. 2C represent components of 
an exemplary sequencing kit . 

FIG. 2A. Sequencing chips, representing an array of 
4 P identical sections each containing identical (or 
5 different) arrays of oligonucleotides. Sections can be 
separated by physical barriers or by hydrophobic strips. 
4,000-16,000 oligochips are contemplated to be in the 
array. 

FIG. 2B is an enlargement of a chip section 
10 containing 4 F spots with each with a particular 

oligonucleotide probe (4,000-16,000) synthesized or 
spotted on that area. Spots can be as small as several 
microns and the size of the section is about 1 mm to 
about 10 mm. 

15 FIG. 2C represents a set of tubes, or one or more 

multiwell plates, with an appropriate number of wells (in 
this case 4 P wells) . Each well contains an amount of a 
specific labeled oligonucleotide. Additional amounts of 
the probes can be stored unlabeled if the labeling is not 

20 done during synthesis; in this case a sequencing kit will 
contain necessary components for probe labeling. The 
lines that are connecting tubes/wells with chip sections 
depict a step in the sequencing procedure where an amount 
of a labeled probe is transferred to a chip section. The 

25 transferring can be done by pipetting (single or multi- 
channel) or by pin array transferring liquid by surface 
tension. Transferring tools can be also included in the 
sequencing kit. 

30 FIG. 3A, FIG. 3B and FIG. 3C. Hybridization of DNA 

fragments produced by a random cutting of an amount of a 
DNA molecule. In FIG. 3A, DNA fragment Tl is such that 
if contains complete targets for both fixed and non- 
fixed-labeled probes. FIG. 3B represents the case where 

35 the DNA fragment T is not _ appropriately cut. In FIG. 3C, 
there is enough space for probe P to hybridize, but the 
adjacent sequence is not complementary to it. In both 



WO 95/09248 



PCTYUS94/10945 



- 20 - 

case B and case C, the signal will be reduced due to 
saturation of the molecules of attached probe F. 
Simultaneous hybridization with DNA fragments and -labeled 
probes and cycling of the hybridization process are some 
5 possible ways to increase the yield of correct adjacent 
hybridizations . 

FIG. 4. Use of Universal Base as a linker or in the 
terminal position for hybridization. The universal bases 

10 (M base, Nichols et al . , 1994) or all four bases may be 
added in the probe synthesis. This is a way to increase 
the length of the probes, and thus stability of the 
duplexes without increasing the number of probes. Also 
the use of universal bases at the free end of probes 

15 provides a spacer that allow the sequence to be read in a 
different frame. 



DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

20 

Determining the sequences of nucleic acid molecules 
is of vital use in all areas of basic and applied 
biological research (Drmanac & Crkvenjakov, 1990) . The 
present invention provides new and efficient methods for 
25 use in sequencing and analyzing nucleic acid molecules. 
One intended use for this methodology is, in conjunction 
with other sequencing techniques, for work on the Human 
Genome Project (HGP) . 

30 Presently, two methods of sequencing by - 

hybridization, SBH, are known. In the first, Format 1, 
unknown genomic DNAs or oligonucleotides of up to about 
100-2000 nucleotides in length are arrayed on a solid 
substrate. These DNAs are then interrogated by 

35 hybridization with a set of labeled probes which are 

generally 6- to 8-mers. In the inverse technique, Format 
2, oligomers of 6 to 8 nucleotides are immobilized on a 
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solid support and allowed to anneal to pieces of cloned 
and labeled DNA . 

In either type of SBH analysis, many steps must be 
5 included in order to arrive at a definitive sequence. 
Particular problems of current SBH methods are those 
associated with the synthesis of large numbers of probes 
and the difficulties of effective discriminative 
hybridization. Full match-mismatch discrimination is 
10 difficult due to two main reasons. Firstly, the end 
mismatch of probes longer than 10 bases is very 
undiscriminative, and secondly, the complex mixture of 
labeled DNA segments that result when analyzing a long 
DNA fragment generates a high background. 

15 

The present invention provides effective 
discriminative hybridization without large numbers of 
probes or probes of increased length, and also eliminates 
many of the labeling and cloning steps which are 

20 particular disadvantages of each of the known SBH 

methods. The disclosed highly efficient nucleic acid 
sequencing methods, termed Format 3 sequencing, are based 
upon hybridization with two sets of small oligonucleotide 
probes of known sequences, and thus at least double the 

25 length of sequence that can be determined. These methods 
allow extremely large nucleic acid molecules, including 
chromosomes, to be sequenced and solve various other SBH 
problems such as, for example, the attachment or 
labelling of many nucleic acid fragments. The invention 

30 is extremely powerful as it may also be used to sequence 
RNA and even unamplified RNA samples. 

Subsequent to the present invention, as disclosed in 
U.S. Serial No. 08/127,402 and in Drmanac (1994), another 
35 variation of SBH was described termed positional SBH 
(PSBH) (Broude et al., 1994). PSBH is basically a 
variant of Format 2 SBH (in which oligonucleotides of 
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known sequences are immobilized and used to hybridize to 
nucleic acids of unknown sequence that have been 
previously labelled). In PSBH, the immobilized probes, 
rather than being simple, single-stranded probes, are 
5 duplexes that contain single stranded 3' overhangs. 
Biotinylated duplex probes are immobilized on 
streptavidin-coated magnetic beads, to form a type of 
immobilized probe, and then mixed with 32 P- labeled target 
nucleic acids to be sequenced. T4 DNA ligase is then 
10 added to ligate any hybridized target DNA to the shorter 
end of the duplex probe . 

However, despite representing an interesting 
approach, PSBH (as reported by Broude et al . , 1994) does 

15 not reflect a significant advance over the existing SBH 

technology. For example, unlike the Format 3 methodology 
of the present invention, PSBH does not extend the length 
of sequence that can be determined in one round of the 
method. PSBH also maintains the burdensome requirement 

20 for labelling the unknown target DNA, which is not 

required for Format 3. In general, PSBH is proposed for 
use in comparative studies or in mapping, rather than in 
de novo genome sequencing. It thus differs significantly 
from Format 3 which, although widely applicable to all 

25 areas of sequencing, is a very powerful tool for use in 
sequencing even the largest of genomes. 

The nucleic acids to be sequenced may first be 
fragmented. This may be achieved by any means including, 

30 for example, cutting by restriction enzyme digestion, 
particularly with Cvi JI as described by Fitzgerald et 
al. (1992); shearing by physical means such as ultrasound 
treatment; by NaOH treatment, and the like. If desired, 
fragments of an appropriate length, such as between about 

35 10 bp and about 40 bp may be cut out of a gel. The 

complete nucleic acid sequence of the original molecule, 
such as a human chromosome, would be determined by 
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defining F + P sequences present in the original molecule 
and assembling portions of overlapping F + P sequences. 
This does not, therefore, require an intermediate -step of 
determining fragment sequences, rather, the sequence of 
5 the whole molecule is constructed from F + P sequences 
delineated. 

For the purposes of the following discussion, it 
will be generally assumed that four bases make up the 

10 sequences of the nucleic acids to be sequenced. These 
are A, G, C and T for DNA and A, G, C and U for RNA. 
However, it may be advantageous in certain embodiments to 
use modified bases in the small oligonucleotide probes. 
To carry out the invention, one would generally first 

15 prepare a number of small oligonucleotide probes of 

defined length that cover all combinations of sequences 
for that length of probe. This number is represented by 
4 N (4 to the power N) where the length of the probe is 
termed N. For example, there are 4096 possible sequences 

20 for a 6-mer probe (4 6 =4096) . 

One set of such probes of length F (4 F ) would be 
fixed in a square array on a microchip - which may be in 
the range of 1 mm 2 or 1 cm 2 . In the present example, 

25 these would be arranged in 64 rows and 64 columns. 

Naturally, one would ensure that the oligo probes were 
attached, or otherwise immobilized, to the microchip 
surface so that were able to take part in hybridization 
reactions. Another set of oligos of length P, 4 P in 

30 number, would be also synthesized. The oligos in this "P 
set" would be labeled with a detectable label and would 
be dispensed into a set of tubes (FIG. 2A, FIG. 2B and 
FIG. 2C) . 

35 4 P of the chips would be combined in a large array 

(or several arrays of approximately 10-10 0 cm 2 , for a 
convenient size) ; where P corresponds to the length of 



WO 95/09248 



PCT/US94/10945 



- 24 - 

oligonucleotides in the second oligomer set (FIG. 2A, 
FIG. 2B and FIG. 2C) . Again, as a convenient example, P 
is chosen to be six (P = 6) . 

5 The nucleic acids to be sequenced would be 

fragmented to give smaller nucleic acid fragments of 
unknown sequence. The average length of these fragments, 
termed T, should generally be greater than the combined 
length of F and P and may be about three times the length 

10 of F (i.e., F + P s T and T ~ 3F) . In the present 

example, one would aim to produce nucleic acid fragments 
of approximately 20 base pairs in length. These 
fragments would be denatured and added to the large 
arrays under conditions that facilitate hybridization of 

15 complementary sequences . 

In the simplest and currently preferred form of the 
invention, hybridization conditions would be chosen that 
would allow significant hybridization to occur only if 6 

20 sequential nucleotides in a nucleic acid fragment were 
complementary to all 6 nucleotides of an F oligonu- 
cleotide probe. Such hybridization conditions would be 
determined by routine optimization pilot studies in which 
conditions such as the temperature, the concentration of 

25 various components, the length of time of the steps, and 
the buffers used, including the pH of the buffer. 

At this stage, each microchip would contain certain 
hybridized complexes. These would be in the form of 

30 probe: fragment complexes in which the entire sequence of 
the probe is hybridized to the fragment, but in which the 
fragment, being longer, has some non-hybridized sequences 
that form a "tail" or "tails" to the complex. In this 
example, the complementary hybridized sequences would be 

35 of length F and the non-hybridized sequences would total 
T - F in length. The complementary portion of the 
fragment may be at or towards an appropriate end, so that 
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a single longer non- hybridized tail is formed. 
Alternatively, the complementary portion of the fragment 
may be towards the opposite end, so that two -non- - 
hybridized tails are formed (FIG. 3A, FIG. 3B and FIG. 
5 3C) . 

After washing to remove the non -complementary 
nucleic acid fragments that did not hybridize, a small 
amount of the labeled oligonucleotides in set P would be 

10 added to each microchip for hybridization to the nucleic 
acid fragment tails of unknown sequence that protrude 
from the probe : fragment complexes. Only one of each of. 
the 4 P nucleotides would be added to each microchip. 
Again, it is currently preferred to use hybridization 

15 conditions that would allow significant binding to occur 
only if all the 6 nucleotides of a labelled probe were 
complementary to 6 sequential nucleotides of a nucleic 
acid fragment tail. The hybridization conditions would 
be determined by pilot studies, as described above, in 

20 which components such as the temperature, concentration, 
time, buffers and the like, are optimized. 

At this stage, each microchip would then contain 
certain * secondary hybridized complexes'. These would be 

25 in the form of probe : fragment : probe complexes in which 
the entire sequence of each probe is hybridized to the 
fragment, and in which the fragment likely has some non- 
hybridized sequences.. In these secondary hybridized 
complexes the immobilized probe and the labelled probe 

30 may be hybridized to the fragment so that the two probes 
are immediately adjacent or "back to back". However, 
given that the fragments will generally be longer than 
the sum of the lengths of the probes, the immobilized 
probe and the labelled probe may be hybridized to the 

35 fragment in non-adjacent positions separated by one or 
more bases . 
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The large arrays would then be treated by a process 
to remove the non- hybridized labelled probes. In 
preferred embodiments, the process employed would remove 
not only the non -hybridized labelled probes, but also the 
non-adjacently-hybridized labelled probes from the array. 
The process would employ discriminating conditions to 
allow those secondary hybridization complexes that 
include adjacent immobilized and labelled probes to be 
discriminating from those secondary hybridization 
complexes in which the nucleic acid fragment is 
hybridized to two probes but which probes are not 
adjacent. This is an important aspect of the invention 
in that it will allow the ultimate delineation of a 
section of fragment sequence corresponding to the 
combined sequences of the immobilized probe and the 
labelled probe. 

The discrimination process employed to remove non- 
hybridized and non- adjacently-hybridized probes from the 
20 array whilst leaving the adjacently-hybridized probes 

attached may again be a controlled washing process. The 
adjacently-hybridized probes would be unaffected by the 
chosen conditions by virtue of their increased stability 
due to the stacking reactions of the adjacent 
25 nucleotides. However, in preferred embodiments, it is 
contemplated that one would treat the large arrays so 
that any adjacent probes would be covalently joined, 
e.g., by treating with a solution containing a chemical 
ligating agent or, more preferably, a ligase enzyme, such 
30 as T 4 DNA ligase (Landegren et al . 1988; Wu & Wallace, 
1989) . 

In any event, the complete array would be subjected 
to stringent washing so that the only label left 
35 associated with the array would be in the form of double- 
stranded probe -fragment -probe complexes with adjacent 
hybridized portions of length F + P (i.e., 12 nucleotides 



10 
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in the present example) . Using this two step 
hybridization reaction, very high discrimination is 
possible because three or four independent discriminative 
processes are taken into account: discriminative 
5 hybridization of fragment T to F bases long probe ; 

discriminative hybridization of P bases long probe to 
fragment T; discriminative stability of full match 
(F + T + P) hybrid in comparison to P hybrids or even to 
mismatched hybrids containing non-adjacent F + P probes; 
10 and discriminative ligation of the two end bases of F 
and P. 

One would then detect the so-called adjacent 
secondary complexes by observing the location of the 

15 remaining label on the array. From the position of the 
label, F + P (e.g., 12) nucleotide long sequences from 
the fragment could be determined by combining the known 
sequences of the immobilized and labelled probes. The 
complete nucleic acid sequence of the original molecule, 

20 such as a human chromosome, could then be reconstructed 
or assembled from the overlapping F + P sequences thus 
determined. 

When ligation is employed in the sequencing process, 
25 as is currently preferred, then the ordinary 

oligonucleotides chip cannot be reused. The inventor 
contemplates that this will not be limiting as various 
methods are available for recycling. For example, one 
may generate a specifically cleavable bond between the 
30 probes and then cleave the bond after detection. 

Alternatively, one may employ ribonucleotides for the 
second probe, probe P, or use a ribonucleotide for the 
joining base in probe P, so that this probe may 
subsequently be removed by RNAase or uracil-DNA 
35 glycosylate treatment (Craig et al . , 1989). Other 

contemplated methods are to establish bonds by chemical 
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ligation which can be selectively cut (Dolinnaya et al . , 
1988) . 

Further variations and improvements to this 
5 sequencing methodology are also contemplated and fall 

within the scope of the present invention. This includes 
the use of modified oligonucleotides to increase the 
specificity or efficiency of the methods, similar to that 
described by Hoheisel & Lehrach (1990) . Cycling 

10 hybridizations can also be employed to increase the 

hybridization signal, as is used in PCR technology. In 
these cases, one would use cycles with different 
temperatures to re-hybridize certain probes. The 
invention also provides for determining shifts in reading 

15 frames by using equimolar amounts of probes that have a 
different base at the end position. For example, using 
equimolar 7-mers in which the first six bases are the 
same defined sequence and the last position may be A, T, 
C or G in the alternative. 

20 

The following examples are included to demonstrate 
preferred embodiments of the invention. It should be 
appreciated by those of skill in the art that the 
techniques disclosed in the examples that follow 

25 represent techniques discovered by the inventor to 

function well in the practice of the invention, and thus 
can be considered to constitute preferred modes for its 
practice. However, those of skill in the art should, in 
light of the present disclosure, appreciate that many 

30 changes can be made in the specific embodiments that are 
disclosed and still obtain a like or similar result 
without departing from the spirit and scope of the 
invention. 



35 
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EXAMPLE I 

PREPARATION OF SUPPORT BOUND OLIGONUCLEOTIDES 

Oligonucleotides, i.e., small nucleic acid segments, 
5 may be readily prepared by, for example, directly 

synthesizing the oligonucleotide by chemical means, as is 
commonly practiced using an automated oligonucleotide 
synthesizer. 

10 Support bound oligonucleotides may be prepared by 

any of the methods known to those of skill in the art 
using any suitable support such as glass, polystyrene or 
teflon. One strategy is to precisely spot 
oligonucleotides synthesized by standard synthesizers. 

15 Immobilization can be achieved using passive adsorption 
(Inouye & Hondo, 1990); using UV light (Nagata et al . , 
1985; Dahlen et al . , 1987; Morriey & Collins, 1989) or by 
covalent binding of base modified DNA (Keller et al . , 
1988; 1989); all references being specifically 

20 incorporated herein. 

Another strategy that may be employed is the use of 
the strong biotin-streptavidin interaction as a linker. 
For example, Broude et al. (1994) describe the use of 

25 biotinylated probes, although these are duplex probes, 
that are immobilized on streptavidin-coated magnetic 
beads. Streptavidin-coated beads may be purchased from 
Dynal, Oslo. Of course, this same linking chemistry is 
applicable to coating any surface with streptavidin. 

30 Biotinylated probes may be purchased from various 
sources, such as, e.g., Operon Technologies 
(Alameda, CA) . 

Nunc Laboratories (Naperville, IL) is also selling 
35 suitable material that could be used. Nunc Laboratories 
have developed a method by which DNA can be covalently 
bound to the microwell surface termed CovaLink NH. 
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CovaLink NH is a polystyrene surface grafted with 
secondary amino groups (>NH) that serve as bridge-heads 
for further covalent coupling. CovaLink Modules may be 
purchased from Nunc Laboratories. DNA molecules may be 
5 bound to CovaLink exclusively at the 5' -end by a 

phosphoramidate bond, allowing immobilization of more 
than 1 pmol of DNA (Rasmussen et al . , 1991) . 

The use of CovaLink NH strips for covalent binding 
10 of DNA molecules at the 5' -end has been described 
(Rasmussen et al . , 1991). In this technology, a 
phosphoramidate bond is employed (Chu et al . , 1983). 
This is beneficial as immobilization using only a single 
covalent bond is preferred. The phosphoramidate bond 
15 joins the DNA to the CovaLink NH secondary amino groups 
that are positioned at the end of spacer arms covalently 
grafted onto the polystyrene surface through a 2 nm long 
spacer arm. To link an oligonucleotide to CovaLink NH 
via an phosphoramidate bond, the oligonucleotide terminus 
20 must have a 5' -end phosphate group. It is, perhaps, even 
possible for biotin to be covalently bound to CovaLink 
and then streptavidin used to bind the probes. 

More specifically, the linkage method includes 
25 dissolving DNA in water (7.5 ng//zl) and denaturing for 10 
min. at 95°C and cooling on ice for 10 min. Ice-cold 0.1 
M 1 -methyl imidazole, pH 7.0 (1-Melm 7 ) , is then added to a 
final concentration of 10 mM 1-Melm 7 . A ss DNA solution 
is then dispensed into CovaLink NH strips (75 /il/well) 
30 standing on ice. 

Carbodiimide 0.2 M l-ethyl-3- (3- 
dimethylaminopropyl) -carbodiimide (EDC) , dissolved in 10 
mM 1-Melm 7 , is made freshT and 25 fil added per well. The 
35 strips are incubated for 5 hours at 50 °C. After 

incubation the strips are washed using, e.g., Nunc-Immuno 
Wash; first the wells are washed 3 times, then they are 
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soaked with washing solution for 5 min., and finally they 
are washed 3 times (wherein he washing solution is 0.4 N 
NaOH, 0.25% SDS heated to 50°C)'. 

5 It is contemplated that a further suitable method 

for use with the present invention is that described in 
PCT Patent Application WO 90/03382 (Southern & Maskos) , 
incorporated herein by reference. This method of 
preparing an oligonucleotide bound to a support involves 

10 attaching a nucleoside 3' -reagent through the phosphate 
group by a covalent phosphodiester link to aliphatic 
hydroxyl groups carried by the support . The 
oligonucleotide is then synthesized on the supported 
nucleoside and protecting groups removed from the 

15 synthetic oligonucleotide chain under standard conditions 
that do not cleave the oligonucleotide from the support. 
Suitable reagents include nucleoside phosphoramidite and 
nucleoside hydrogen phosphorate. 

20 In more detail, to use this method, a support, such 

as a glass plate, is derivatized by contact with a 
mixture of xylene, glycidoxypropyltrimethoxysilane, and a 
trace of di i sopropy let hyl amine at 90°C overnight. It is 
then washed thoroughly with methanol, ether and air- 

25 dried. The derivatized support is then heated with 

stirring in hexaethyleneglycol containing a catalytic 
amount of concentrated sulfuric acid, overnight in an 
atmosphere of argon, at 80 °C, to yield an alkyl hydroxyl 
derivatized support. After washing with methanol and 

3 0 ether, the support is dried under vacuum and stored under 
argon at -20°C. 

Oligonucleotide synthesis is then performed by hand 
under standard conditions using the derivatized glass 
35 plate as a solid support. The first nucleotide will be a 
3' - hydrogen phosphate, used in the form of the 
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triethylammonium salt. This method results in support 
bound oligonucleotides of high purity. 

An on-chip strategy for the preparation of DNA probe 
5 arrays may be employed. For example, addressable laser- 
activated photodeprotection may be employed in the 
chemical synthesis of oligonucleotides directly on a 
glass surface, as described by Fodor et al . (1991), 
incorporated herein by reference. Probes may also be 
10 immobilized on nylon supports as described by Van Ness et 
al. (1991); or linked to teflon using the method of 
Duncan & Cavalier (1988) ; all references being 
specifically incorporated herein. 

15 Fodor et al . (1991) describe the light-directed 

synthesis of dinucleotides which is applicable to the 
spatially directed synthesis of complex compounds for use 
in the microf abrication of devices. This is based upon a 
method that uses light to direct the simultaneous 

20 synthesis of chemical compounds on a solid support. The 
pattern of exposure to light or other forms of energy 
through a mask, or by other spatially addressable means, 
determines which regions of the support are activated for 
chemical coupling. Activation by light results from the 

25 removal of photolabile protecting groups from selected 
areas. After deprotection, a first compound bearing a 
photolabile protecting group is exposed to the entire 
surface, but reaction occurs only with regions that were 
addressed by light in the preceding step. The substrate 

30 is then illuminated through a second mask, which 

activates a different region for reaction with a second 
protected building block. The pattern of masks used in 
these illuminations and the sequence of reactants define 
the ultimate products and their locations. A high degree 

35 of miniaturization is possible with the Fodor method 

because the density of synthesis sites is bounded only by 
physical limitations on spatial addressability, i.e., the 
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diffraction of light. Each compound is accessible and 
its position is precisely known. hence, an oligo chip 
made in this way would be ready for use in SBH. 

5 Fodor et al . (1991) describes the light -activated 

formation of a dinucleotide as follows. 5 ' -Nitroveratryl 
thymidine was synthesized from the 3 ' -O-thymidine 
acetate. After deprotection with base, the 5'- 
nitroveratryl thymidine was attached to an aminated 

10 substrate through a linkage to the 3'-hydroxyl group. 
The nitrovertryl protecting groups were removed by 
illumination through a 500 -/xm checkerboard mask. The 
substrate was then treated with phosphoramidite-activated 
2 ' -deoxycytidine . In order to follow the reaction 

15 f luorometrically, the deoxycytidine had been modified 

with an FMOC-protected aminohexyl linker attached to the 
exocyclic amine. After removal of the FMOC protecting 
group with base, the regions that contained the 
dinucleotide were f luorescently labeled by treatment of 

20 the substrate with FITC. Therefore, following this 
method, support bound- oligonucleotides can be 
synthesized. 

To link an oligonucleotide to a nylon support, as 
25 described by Van Ness et al . (1991), requires activation 
of the nylon surface via alkylation and selective 
activation of the 5 '-amine of oligonucleotides with 
cyanuric chloride, as follows. A nylon surface is 
ethylated using triethyloxonium tetraf luoroborate to form 
30 amine reactive imidate esters on the surface of the nylon 
1 -methyl -2 -pyrrol idone is used as a solvent. The nylon 
surface is unpolished to effect the greatest possible 
surface area. 

35 The activated surface is then reacted with 

poly (ethyleneimine) (M r - 10K-70K) to form a polymer 
coating that provides an extended amine surface for the 
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attachment of oligos. Amine- tailed oligonucleotide (s) 
selectively react with excess cyanuric chloride, 
exclusively on the amine tail, to give a 4 , 6-dichloro- 
1, 3 , 5-triazinyl-oligonucleotide (s) in quantitative yield. 
5 The displacement of one chlorine moiety of cyanuric 

chloride by the amino group significantly diminishes the 
reactivity of the remaining chlorine groups. This 
results in increased hydrolytic stability of the 4,6- 
dichloro-1, 3 , 5-triazinyl-oligonucleotide (s) are stable 
10 for extended periods in buffered aqueous solutions (pH 
8.3, 4°C, 1 week) and are readily isolated and purified 
by size elusion chromatography or ultrafiltration. 

The reaction is specific for the amine tail with no 
15 apparent reaction on the nucleotide moieties. The PEI- 
coated nylon surface is then reacted with the cyanuric 
chloride activated oligonucleotide. High concentrations 
of the 'capture' sequence are readily immobilized on the 
surface and the unreacted amines are capped with succinic 
20 anhydride in the final step of the derivatization 
process . 

One particular way to prepare support bound 
oligonucleotides is to utilize the light -generated 

25 synthesis described by Pease et al. (1994, incorporated 
herein by reference) . These authors used current 
photolithographic techniques to generate arrays of 
immobilized oligonucleotide probes (DNA chips) . These 
methods, in which light is used to direct the synthesis 

30 of oligonucleotide probes in high-density, miniaturized 
arrays, utilize photolabile 5' -protected N-acyl- 
deoxynucleoside phosphoramidites, surface linker 
chemistry and versatile combinatorial synthesis 
strategies. A matrix of 256 spatially defined 

35 oligonucleotide probes may be generated in this manner 

and then used in the advantageous Format 3 sequencing, as 
described herein. 
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Pease et al . (1994) presented a strategy suitable 
for use in light-directed oligonucleotide synthesis. In 
this method, the surface of a solid support modified with 
photolabile protecting groups is illuminated through a 
5 photolithographic mask, yielding reactive hydroxyl groups 
in the illuminated regions. A 3 ' O-phosphoramidite- 
activated deoxynucleoside (protected at the 5' -hydroxyl 
with a photolabile group) is then presented to the 
surface and coupling occurs at sites that were exposed to 

10 light. Following capping, and oxidation, the substrate 

is rinsed and the surface is illuminated through a second 
mask, to expose additional hydroxyl groups for coupling. 
A second 5 ' -protected, 3 ' O-phosphoramidite-activated 
deoxynucleoside is presented to the surface. The 

15 selective photodeprotection and coupling cycles are 

repeated until the desired set of products is obtained. 
Since photolithography is used, the process can be 
miniaturized to generate high-density arrays of 
oligonucleotide probes, the sequence of which is known at 

20 each site. 

The synthetic pathway for preparing the necessary 
5' O- (a-methyl-6-nitropiperonyloxycabonyl) -Itf-acyl-2' - 
deoxynucleoside phosphoramidites (MeNPoc-N-acyl-2 ' - 

25 deoxynucleoside phophoramidites) involves, in the first 

step, an itf-acyl-2 ' -deoxynucleoside that reacts with l-(2- 
nitro-4 , 5-methylenedioxyphenyl) ethyl- 1-chlorof ormate to 
yield 5' -MeNPoc-N-acyl-2 ' -deoxynucleoside . In the second 
step, the 3 '-hydroxyl reacts with 2-cyanoethyl N,N'~ 

30 diisopropylchlorophosphoramidite, using standard 
procedures, to yield the 5' -MeNPoc-AT-acyl-2 ' - 
deoxynucleoside- 3 ' -O- (2-cyanoethyl-N-N- 

diisopropyl) phosphoramidites . The photoprotecting group 
is stable under ordinary phosporamidite synthesis 
35 conditions and can be removed with aqueous base. These 
reagents can be stored for long periods under argon at 
4°C. 
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Photolysis half-times of 28 s, 31 s, 27 s, and 18 s 
for MeNPoc-dT, MeNPoc-dC ibu , MeNPoc-dG PAC , and MeNPoc- 
dA PAC respectively, have been reported (Pease et al . , 
1994) . In lithographic synthesis, illumination times of 
5 4.5 min (9 x t 1 / 2 MeNPoc " ( ^ c ) are therefore recommended to 
ensure >99% removal of MeNPoc protecting groups. 

A suitable synthetic support is one consisting of a 
5.1 x 7.6 cm glass substrate prepared by cleaning in 

10 concentrated NaOH, followed by exhaustive rinsing in 

water. The surfaces would then be derivatized for 2 hr 
with a solution of 10% (vol/vol) bis (2- 
hydroxyethyl ) aminopropyltriethoxysilane (Petrarch 
Chemicals, Bristol, PA) in 95% ethanol, rinsed thoroughly 

15 with ethanol and ether, dried in vacuo at 40°C, and 

heated at 100 °C for 15 min. In such studies, a synthesis 
linker would be attached by reacting derivatized 
substrates with 4 , 4 9 -dimethoxytrityl (DMT) -hexaethyloxy- 
O-cyanoethyl phosphoramidite . 

20 

In summary, to initiate the synthesis of an 
oligonucleotide probe, the appropriate deoxynucleoside 
phosphoramidite derivative would be attached to a 
synthetic support through a linker. Regions of the 

25 support are then activated for synthesis by illumination 
through, e.g., 800 x 12800 apertures of a 
photolithographic mask. Additional phosphoramidite 
synthesis cycles may be performed (with DMT-protected 
deoxynucleosides) to generate any required sequence, such 

30 as any 4-,5-,6-,7-,8-,9- or even 10-mer sequence. 

Following removal of the phosphate and exocyclic amine 
protecting groups with concentrated NH 4 0H for 4 hr, the 
substrate may then be mounted in a water- jacketed 
thermostatically controlled hybridization chamber, ready 

35 for use . 
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Of course, one could easily purchase a DNA chip, 
such as one of the light -activated chips described above, 
from a commercial source. In this regard, one may 
contact Affymetrix of Santa Clara, CA 95051, and 
5 Beckman . 

EXAMPLE II 

MODIFIED OLIGONUCLEOTIDES FOR USE IN PROBES 

10 

Modified oligonucleotides may be used throughout the 
procedures of the present invention to increase the 
specificity or efficiency of hybridization. A way to 
achieve this is the substitution of natural nucleotides 

15 by base modification. For example, pyrimidines with a 

halogen at the exposition may be used. This is believed 
to improve duplex stability by influencing base stacking. 
2, 6-diaminopurine may also be used to give a third 
hydrogen bond in its base pairing with thymine, thereby 

20 thermally stabilizing DNA-duplexes . Using 2,6- 

diaminopurine is reported to lead to a considerable 
improvement in the duplex stability of short oligomers. 
Its incorporation is proposed to allow more stringent 
conditions for primer annealing, thereby improving the 

25 specificity of the duplex formation and suppressing 
background problems or the use of shorter oligomers. 

The synthesis of the triphosphate versions of these 
modified nucleotides is disclosed by Hoheisel & Lehrach 

30 (1990, incorporated herein by reference) . Briefly, 5- 
Chloro-2' -deoxyuridine and 2 , 6-diaminopurine 2'- 
deoxynucleoside are purchased, e.g., from Sigma. 
Phosphorylation is carried out as follows: 50 mg dry 2- 
NH 2 -dAdo is taken up in 500/il dry triethyl phosphate 

3 5 - stirring under argon. 25 fil P0C1 3 is added and the 
mixture incubated at -20°C. In the meantime, 1 mmol 
pyrophosphoric acid is dissolved in 0.95 ml tri-n- 
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butylamine and 2 ml methanol and dried in a rotary 
evaporator. Subsequently it is dried by evaporation 
twice from 5 ml pyridine, with 70 /il tri-n-butylamine 
also added before the second time. Finally it is 
5 dissolved in 2 ml dry dimethyl formamide. 

After 90 min at -20°C, the phosphorylation mixture 
is evaporated to remove excess POCl 3 and the tri-n- 
butylammonium pyrophosphate in dimethyl formamide is 

10 added. Incubation is for 1.5 min at room temperature. 
The reaction is stopped by addition of 5 ml 0.2 M 
triethylammonium bicarbonate (pH 7.6) and kept on ice for 
4 hours. For 5-Cl-dUrd, the conditions would be 
identical, but 50 /xl P0C1 3 would be added and the 

15 phosphorylation carried out at room temperature for 4 
hours . 

After the hydrolysis, the mixture is evaporated, the 
pH adjusted to 7.5 , and extracted with 1 volume diethyl 
20 ether. Separation of the products is, e.g., on a (2.5 x 
20 cm) Q-Sepharose column using a linear gradient of 0.15 
M to 0.8 M triethylammonium bicarbonate. Stored frozen, 
the nucleotides are stable over long periods of time. 

25 One may also use the non- discriminatory base 

analogue, or universal base, as designed by Nichols 
et al. (1994). This new analogue, 1- (2 ' -deoxy-0-D- 
ribofuranosyl) -3-nitropyrrole (designated M) , was 
generated for use in oligonucleotide probes and primers 

30 for solving the design problems that arise as a result of 
the degeneracy of the genetic code, or when only 
fragmentary peptide sequence data are available. This 
analogue maximizes stacking while minimizing hydrogen- 
bonding interactions without sterically disrupting a DNA 

35 duplex. 
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The M nucleoside analogue was designed to maximize 
stacking interactions using aprotic polar substituents 
linked to heteroaromatic rings, enhancing intra- and 
inter-strand stacking interactions to lessen the role of 
5 hydrogen bonding in base -pairing specificity. Nichols 
et aJ. (1994) favored 3 -nitropyrrole 2'- 
deoxyribonucleoside because of its structural and 
electronic resemblance to p-nitroaniline, whose 
derivatives are among the smallest known intercalators of 
10 double -stranded DNA. 

The dimethoxytrityl -protected phosphoramidite of 
nucleoside M is also available for incorporation into 
nucleotides used as primers for sequencing and polymerase 
15 chain reaction (PCR) . Nichols et al . (1994) showed that 
a substantial number of nucleotides can be replaced by M 
without loss of primer specificity. 

A unique property of M is its ability to replace 
20 long strings of contiguous nucleosides and still yield 

functional sequencing primers. Sequences with three, six 
and nine M substitutions have all been reported to give 
readable sequencing ladders, and PCR with three different 
M-containing primers all resulted in amplification of the 
25 correct product (Nichols et al . , 1994). 

The ability of 3 -nitropyrrole -containing 
oligonucleotides to function as primers strongly suggests 
that a duplex structure must form with complementary 

30 strands. Optical thermal profiles obtained for the 
oligonucleotide pairs d (5 ' -C 2 -T 5 XT 5 G 2 -3 ' ) and d(5'- 
C 2 A 5 YA 5 G 2 -3 # ) (where X and Y can be A, C, G, T or M) were 
reported to fit the normal sigmoidal pattern observed for 
the DNA double- to-single stra:nd transition. The T m 

35 values of the oligonucleotides containing X-M base pairs 
(where X was A, C, G or T, and Y was M) were reported to 
all fall within a 3°C range (Nichols etal., 1994). 
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EXAMPLE III 
PREPARATION OF SEQUENCING CHIPS AND ARRAYS 

5 The present example describes physical embodiments 

of sequencing chips contemplated by the inventor. 

A basic example is using 6-mers attached to 
50 micron surfaces to give a chip with dimensions of 

10 3 x 3 mm which can be combined to give an array of 
20 x 20 cm. Another example is using 9-mer 
oligonucleotides attached to 10 x 10 microns surface to 
create a 9-mer chip, with dimensions of 5 x 5 mm. 4000 
units of such chips may be used to create a 30 x 30 cm 

15 array. FIG. 2A, FIG. 2B and FIG. 2C illustrate yet 
another example of an array in which 4,000 to 16,000 
oligochips are arranged into a square array. A plate, or 
collection of tubes, as also depicted, may be packaged 
with the array as part of the sequencing kit. 

20 

The arrays may be separated physically from each 
other or by hydrophobic surfaces. One possible way to 
utilize the hydrophobic strip separation is to use 
technology such as the Iso-Grid Microbiology System 
25 produced by QA Laboratories, Toronto, Canada. 

Hydrophobic grid membrane filters (HGMF) have been 
in use in analytical food microbiology for about a decade 
where they exhibit unique attractions of extended 

30 numerical range and automated counting of colonies. One 
commercially-available grid is ISO-GRID™ from QA 
Laboratories Ltd. (Toronto, Canada) which consists of a 
square (60 x 60 cm) of polysulfone polymer (Gelman 
Tuffryn HT-450, 0.45^ pore size) on which is printed a 

35 black hydrophobic ink grid consisting of 1600 (40 x 40) 
square cells. HGMF have previously been inoculated with 
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bacterial suspensions by vacuum filtration and incubated 
on the differential or selective media of choice. 

Because the microbial growth is confined to grid 
5 cells of known position and size on the membrane, the 
HGMF functions more like an MPN apparatus than a 
conventional plate or membrane filter. Peterkin et al . 
(1987) reported that these HGMFs can be used to propagate 
and store genomic libraries when used with a HGMF 
10 replicator. One such instrument replicates growth from 
each of the 1600 cells of the ISO-GRID and enables many 
copies of the master HGMF to be made (Peterkin et al . , 
1987) . 

15 Sharpe et al . (1989) also used ISO-GRID HGMF from QA 

Laboratories and an automated HGMF counter (MI -100 
Interpreter) and RP-100 Replicator. They reported a 
technique for maintaining and screening many microbial 
cultures . 

20 

Peterkin and colleagues later described a method for 
screening DNA probes using the hydrophobic grid-membrane 
filter (Peterkin et al . , 1989). These authors reported 
methods for effective colony hybridization directly on 

25 HGMFs . Previously, poor results had been obtained due to 
the low DNA binding capacity of the polysulfone polymer 
on which the HGMFs are printed. However, Peterkin et al. 
(1989) reported that the binding of DNA to the surface of 
the membrane was improved by treating the replicated and 

3 0 incubated HGMF with polyethyleneimine , a polycation, 

prior to contact with DNA. Although this early work uses 
cellular DNA attachment, and has a different objective to 
the present invention, the methodology described may be 
readily adapted for format 3 SBH. 

35 

In order to identify useful sequences rapidly, 
Peterkin et al . (1989) used radiolabeled plasmid DNA from 
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various clones and tested its specificity against the DNA 
on the prepared HGMFs . In this way, DNA from recombinant 
plasmids was rapidly screened by colony hybridization 
against 100 organisms on HGMF replicates which can be 
5 easily and reproducibly prepared. 

Two basic problems have to be solved. Manipulation 
with small (2-3 mm) chips, and parallel execution of 
thousands of the reactions. The solution of the 

10 invention is to keep the chips and the probes in the 

corresponding arrays. In one example, chips containing 
250,000 9-mers are synthesized on a silicon wafer in the 
form of 8x8 mM plates (15 /xM/oligonucleotide, Pease 
et al., 1994) arrayed in 8x12 format (96 chips) with a 1 

15 mM groove in between. Probes are added either by 

multichannel pipet or pin array, one probe on one chip. 
To score all 4000 6-mers, 42 chip arrays have to be used, 
either using different ones, or by reusing one set of 
chip arrays several times. 

20 

In the above case, using the earlier nomenclature of 
the application, F = 9; P - 6; and F + P = 15. Chips may 
have probes of formula BxNn, where x is a number of 
specified bases B; and n is a number of non-specified 
25 bases, so that x = 4 to 10 and n = 1 to 4 . To achieve 
more efficient hybridization, and to avoid potential 
influence of any support oligonucleotides, the specified 
bases can be surrounded by unspecified bases, thus 
represented by a formula such as (N)nBx(N)m (FIG. 4). 

30 

EXAMPLE IV 
PREPARATION OF NUCLEIC ACID FRAGMENTS 

35 The nucleic acids to be sequenced may be obtained 

from any appropriate source, such as cDNAs, genomic DNA, 
chromosomal DNA, microdissected chromosome bands, cosmid 
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or YAC inserts, and RNA, including mRNA without any 
amplification steps. For example, Sambrook et al . (1989) 
describes three protocols for the isolation of high 
molecular weight DNA from mammalian cells (p. 9.14-9.23). 

5 

The nucleic acids would then be fragmented by any of 
the methods known to those of skill in the art including, 
for example, using restriction enzymes as described at 
10 9.24-9.28 of Sambrook et al . (1989), shearing by 
ultrasound and NaOH treatment. 

Low pressure shearing is also appropriate, as 
described by Schriefer et al . (1990, incorporated herein 

15 by reference) . In this method, DNA samples are passed 

through a small French pressure cell at a variety of low 
to intermediate pressures. A lever device allows 
controlled application of low to intermediate pressures 
to the cell . The results of these studies indicate that 

20 low-pressure shearing is a useful alternative to sonic 
and enzymatic DNA fragmentation methods. 

One particularly suitable way for fragmenting DNA is 
contemplated to be that using the two base recognition 

25 endonuclease, Cvi JI, described by Fitzgerald et al . 

(1992) . These authors described an approach for the 
rapid fragmentation and fractionation of DNA into 
particular sizes that they contemplated to be suitable 
for shotgun cloning and sequencing. The present inventor 

30 envisions that this will also be particularly useful for 
generating random, but relatively small, fragments of DNA 
for use in the present sequencing technology. 

The restriction endoiiuclease Cvi JI normally cleaves 
35 the recognition sequence PuGCPy between the G and C to 
leave blunt ends. Atypical reaction conditions, which 
alter the specificity of this enzyme (CviJI**) , yield a 
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quasi-random distribution of DNA fragments from the small 
molecule pUC19 (2688 base pairs). Fitzgerald et al . 
(1992) quantitatively evaluated the randomness of this 
fragmentation strategy, using a CviJI** digest of pUC19 
5 that was size fractionated by a rapid gel filtration 
method and directly ligated, without end repair, to a 
lacZ minus M13 cloning vector. Sequence analysis of 76 
clones showed that CviJI** restricts PyGCPy and PuGCPu, 
in addition to PuGCPy sites, and that new sequence data 
10 is accumulated at a rate consistent with random 
fragmentation . 

As reported in the literature, advantages of this 
approach compared to sonication and agarose gel 

15 fractionation include: smaller amounts of DNA are 

required (0.2-0.5 /xg instead of 2-5 fig) ; and fewer steps 
are involved (no preligation, end repair, chemical 
extraction, or agarose gel electrophoresis and elution 
are needed) . These advantages are also proposed to be of 

20 use when preparing DNA for sequencing by Format 3. 

Irrespective of the manner in which the nucleic acid 
fragments are obtained or prepared, it is important to 
denature the DNA to give single stranded pieces available 
25 for hybridization. This is achieved by incubating the 

DNA solution for 2-5 minutes at 80-90°C. The solution is 
then cooled quickly to 2°C to prevent renaturation of the 
DNA fragments before they are contacted with the chip. 
Phosphate groups must also be removed from genomic DNA, 
30 as described in Example VI. 

EXAMPLE V 
PREPARATION OF LABELLED PROBES 

The oligonucleotide probes may be prepared by 
35 automated synthesis, which is routine to those of skill 
in the art, for example, using an Applied Biosystems 
system. Alternatively, probes may be prepared using 
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Genosys Biotechnologies Inc. methods using stacks of 
porous Teflon wafers. 

Oligonucleotide probes may be labelled with, for 
5 example, radioactive labels ( 35 S, 32 P, 33 P, and 

preferably, 33 P) for arrays with 100-200 /xm or 100-400 /xm 
spots; non-radioactive isotopes (Jacobsen et al . , 1990); 
or fluorophores (Brumbaugh et al . , 1988). All such 
labelling methods are routine in the art, as exemplified 
10 by the relevant sections in Sambrook et al . (1989) and by 
further references such as Schubert et al . (1990), 
Murakami et al. (1991) and Cate et al . (1991), all 
articles being specifically incorporated herein by 
reference . 



15 



20 



In regard to radiolabeling, the common methods are 
end- labelling using T4 polynucleotide kinase or high 
specific activity labelling using Klenow or even T7 
polymerase. These are described as follows. 



Synthetic oligonucleotides are synthesized without a 
phosphate group at their 5' termini and are therefore 
easily labeled by transfer of the -y- 32 P or y- 33 P from [7- 
32 P]ATP or [y- 33 P]ATP using the enzyme bacteriophage T4 

25 polynucleotide kinase. If the reaction is carried out 

efficiently, the specific activity of such probes can be 
as high as the specific activity of the [7~ 32 P]ATP or [y- 
33 P]ATP itself. The reaction described below is designed 
to label 10 pmoles of an oligonucleotide to high specific 

30 activity. Labeling of different amounts of 

oligonucleotide can easily be achieved by increasing or 
decreasing the size of the reaction, keeping the 
concentrations of all components constant. 



35 



A reaction mixture would be created using 1.0 /xl of 
oligonucleotide (10 pmoles/jxl) ; 2.0 /xl of 10 x 
bacteriophage T4 polynucleotide kinase buffer; 5.0 /xl of 
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[7- 32 P]ATP or [7- 33 P]ATP (sp. act. 5000 Ci/mmole; 10 
mCi/ml in aqueous solution) (10 pmoles) ; and 11.4 /il of 
water. Eight (8) units (-1 fil) of bacteriophage T4 
polynucleotide kinase is added to the reaction mixture 
5 mixed well, and incubated for 45 minutes at 37°C. The 
reaction is heated for 10 minutes at 68°C to inactivate 
the bacteriophage T4 polynucleotide kinase. 

The efficiency of transfer of 32 P or 33 P to the 
10 oligonucleotide and its specific activity is then 

determined. If the specific activity of the probe is 
acceptable, it is purified. If the specific activity is 
too low, an additional 8 units of enzyme is added and 
incubated for a further 30 minutes at 37°C before heating 
15 the reaction for 10 minutes at 68 °C to inactivate the 
enzyme . 

Purification of radiolabeled oligonucleotides can be 
achieved by precipitation with ethanol ; precipitation 
20 with cetylpyridinium bromide; by chromatography through 
bio-gel P-60; or by chromatography on a Sep-Pak C 18 
column . 

Probes of higher specific activities can be obtained 
25 using the Klenow fragment of E. coli. DNA polymerase I 
to synthesize a strand of DNA complementary to the 
synthetic oligonucleotide. A short primer is hybridized 
to an oligonucleotide template whose sequence is the 
complement of the desired radiolabeled probe. The primer 
30 is then extended using the Klenow fragment of E. coli DNA 
polymerase I to incorporate [of- 32 P] dNTPs or [a- 33 P]dNTPs 
in a template -directed manner. After the reaction, the 
template and product are separated by denaturation 
followed by electrophoresis through a polyacryl amide gel 
35 under denaturing conditions. With this method, it is 

possible to generate oligonucleotide probes that contain 
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several radioactive atoms per molecule of 
oligonucleotide, if desired. 

To use this method, one would mix in a microfuge 
5 tube the calculated amounts of [a- 32 P]dNTPs or [a- 
33 P]dNTPs necessary to achieve the desired specific 
activity and sufficient to allow complete synthesis of 
all template strands. The concentration of dNTPs should 
not be less than 1/xM at any stage during the reaction. 
10 Then add to the tube the appropriate amounts of primer 
and template DNAs , with the primer being in three- to 
tenfold molar excess over the template. 

0.1 volume of 10 x Klenow buffer would then be added 
15 and mixed well. 2-4 units of the Klenow fragment of 

E. coli DNA polymerase I would then be added per 5 /xl of 
reaction volume, mixed and incubated for 2-3 hours at 
4oC. If desired, the progress of the reaction may be 
monitored by removing small (0.1-/xl) aliquots and 
20 measuring the proportion of radioactivity that has become 
precipitable with 10% trichloroacetic acid (TCA) . 

The reaction would be diluted with an equal volume 
of gel-loading buffer, heated to 80oC for 3 minutes, and 

25 then the entire sample loaded on a denaturing 

polyacrylamide gel. Following electrophoresis, the gel 
is autoradiographed, allowing the probe to be localized 
and removed from the gel . Various methods for 
fluorophobic labelling are also available, as follows. 

30 Brumbaugh et al, (1988) describe the synthesis of 

f luorescently labeled primers. A deoxyuridine analog 
with a primary amine "linker arm" of 12 atoms attached at 
C-5 is synthesized. Synthesis of the analog consists of 
derivatizing 2' -deoxyuridine' through organometallic 

35 intermediates to give 5' (methyl propenoy 1 ) - 2 ' - 

deoxyuridine. Reaction with dimethoxytrityl- chloride 
produces the corresponding 5 ' -dimethoxytrityl adduct. 
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The methyl ester is hydrolyzed, activated, and reacted 
with an appropriately monoacylated alkyl diamine. After 
purification, the resultant linker arm nucleosides are 
converted to nucleoside analogs suitable for chemical 
5 oligonucleotide synthesis. 

Oligonucleotides would then be made that include one 
or two linker arm bases by using modified phosphoridite 
chemistry.. To a solution of 50 nmol of the linker arm 

10 oligonucleotide in 25 /xl of 500 mM sodium bicarbonate (pH 
9.4) is added 20 /xl of 300 mM FITC in dimethyl sulfoxide. 
The mixture is agitated at room temperature for 6 hr. 
The oligonucleotide is separated from free FITC by 
elution from a 1 x 30 cm Sephadex G-25 column with 20 mM 

15 ammonium acetate (pH 6) , combining fractions in the first 
UV- absorbing peak. 

In general, fluorescent labelling of an 
oligonucleotide at it's 5' -end initially involved two 

20 steps. First, a N-protected aminoalkyl phosphoramidite 
derivative is added to the 5' -end of an oligonucleotide 
during automated DNA synthesis. After removal of all 
protecting groups, the NHS ester of an appropriate 
fluorescent dye is coupled to the 5 ' -amino group 

25 overnight followed by purification of the labelled 

oligonucleotide from the excess of dye using reverse 
phase HPLC or PAGE. 

Schubert et al . (1990) described the synthesis of a 
30 phosphoramidite that enables oligonucleotides labeled 
with fluorescein to be produced during automated DNA 
synthesis. Fluorescein methylester is alkylated with 4- 
chloro(4,4' -dimethoxytrityl)butanol-l in the presence of 
K 2 C0 3 and KI in DMF for Yl hrs. After removal of the 
35 trityl group with 1% TFA in chloroform, the product is 
phosphitylated by standard procedures with 
bis (diisopropylamino)methoxyphosphine. Phosphorylation 
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of the above obtained fluorescein derivative leads an H- 
phosphonate in reasonable yields. The resulting amidite 
(0.1 M solution in dry acetonitrile) is used for the 
automated synthesis of different primers using /3- 
5 cyanoethyl phosphoramidite chemistry and a DNA 

synthesizer. Cleavage from the support and deprotection 
is performed with 25% aqueous ammonia for 3 6 hrs at room 
temperature. The crude product is purified by PAGE and 
the labelled primer is visible as a pale green 
10 fluorescent band at 310 nm. Elution and desalting using 
RP 18 cartridges yields the desired product. 

The fluorescent labelling of the 5 '-end of a probe 
in the Schubert method is directly achieved during DNA 

15 synthesis in the last coupling cycle. Coupling yields 
are as high as with the normal phosphoramidites . After 
deprotection and removal of ammonia by lyophilization 
using a speed vac or by ethanol precipitation, 
fluorescent labelled oligonucleotides can be directly 

20 used for DNA sequencing in Format 3 SBH. 

Murakami et al . also described the preparation of 
fluorescein- labeled oligonucleotides. This synthesis is 
based on a polymer- supported phosphoramidite and hydrogen 

25 phosphonate method. Ethylenediamine or 

hexamethylenediamine is used as a tether. They were 
introduced via a phosphoramidate linkage, which was 
formed by oxidation of a hydrogen-phosphonate 
intermediate in CCI 4 solution. The modified 

30 oligonucleotides are subjected to labeling using a 

primary amine orienting reagent, FITC, on the beads. The 
resulting modified oligonucleotide is cleaved from beads 
and subsequently purified by RPLC. 

35 Cate et al. (1991) describe the use of 

oligonucleotide probes directly conjugated to alkaline 
phosphatase in combination with a direct chemiluminescent 
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substrate (AMPPD) to allow probe detection. Alkaline 
phosphatase may be covalently coupled to a modified base 
of the oligonucleotide. After hybridization,- the -oligo 
would be incubated with AMPDD. The alkaline phosphatase 
5 enzyme breaks AMPDD to yield a compound that produces 
fluorescence without excitation, i.e., a laser is not 
needed. It is contemplated that a strong signal can be 
generated using such technology. 

10 Labelled probes could readily be purchased from a 

variety of commercial sources, including GEN S ET, rather 
than synthesized. 

15 EXAMPLE VI 

REMOVAL OF PHOSPHATE GROUPS 

Both bacterial alkaline phosphatase (BAP) and calf 
intestinal alkaline phosphatase (CIP) catalyze the 

20 removal of 5' -phosphate residues from DNA and RNA. They 
are therefore appropriate for removing 5' phosphates from 
DNA and/or RNA to prevent ligation and inappropriate 
hybridization. Phosphate removal, as described by 
Sambrook et al . (1989), would be performed after cutting, 

25 or otherwise shearing, the genomic DNA. 

BAP is the more active of the two alkaline 
phosphatases, but it is also far more resistant to heat 
and detergents. It is therefore difficult to inhibit BAP 

30 completely at the end of dephosphorylation reactions. 
Proteinase K is used to digest CIP, which must be 
completely removed if subsequent ligations are to work 
efficiently. An alternative method is to inactivate the 
CIP by heating to 65°C for 1 hour (or 75°C for 10 

35 minutes) in the presence of 5 mM EDTA (pH 8.0) and then 
to purify the dephosphorylated DNA by extraction with 
phenol : chloroform . 
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EXAMPLE VII 

CONDUCTING SEQUENCING BY TWO STEP HYBRIDIZATION 

5 Following are certain examples to describe the 

execution of the sequencing methodology contemplated by 
the inventor. First, the whole chip would be hybridized 
with mixture of DNA as complex as 100 million of bp (one 
human chromosome) . Guidelines for conducting 
10 hybridization can be found in papers such as Drmanac 

et al. (1990); Khrapko et al . (1991); and Broude et al . 
(1994) . These articles teach the ranges of hybridization 
temperatures, buffers and washing steps that are 
appropriate for use in the initial step of Format 3 SBH. 

15 

The present inventor particularly contemplates that 
hybridization is to be carried out for up to several 
hours in high salt concentrations at a low temperature 
(~2°C to 5°C) because of a relatively low concentration 

20 of target DNA that can be provided. For this purpose, 
SSC buffer is used instead of sodium phosphate buffer 
(Drmanac et al . , 1990), which precipitates at 10°C. 
Washing does not have to be extensive (a few minutes) 
because of the second step, and can be completely 

25 eliminated when the hybridization cycling is used for the 
sequencing of highly complex DNA samples. The same 
buffer is used for hybridization and washing steps to be 
able to continue with the second hybridization step with 
labeled probes. 

30 

After proper washing using a simple robotic device 
on each array, e.g., a 8 x 8mm array (Example III), one 
labeled, probe, e.g., a 6-mer, would be added. A 96-tip 
or 96-pin device would be used, performing this in 42 
35 operations. Again, a range of discriminatory conditions 
could be employed, as previously described in the 
scientific literature. 
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The present inventor particularly contemplates the 
use of the following conditions. First, after adding 
labeled probes and incubating for several minutes only 
(because of the high concentration of added 
5 oligonucleotides) at a low temperature (0-5°C) , the 
temperature is increased to 3-10°C, depending on F+P 
length, and the washing buffer is added. At this time, 
the washing buffer used is one compatible with any 
ligation reaction (e.g., 100 mM salt concentration 
10 range) . After adding ligase, the temperature is 

increased again to 15-37°C to allow fast ligation (less 
than 3 0 min) and further discrimination of full match and 
mismatch hybrids. 

15 The use of cationic detergents is also contemplated 

for use in Format 3 SBH, as described by Pontius & Berg 
(1991, incorporated herein by reference) . These authors 
describe the use of two simple cationic detergents, 
dedecyl- and cetyltrimethylammonium bromide (DTAB and 

20 CTAB) in DNA renaturation . 

DTAB and CTAB are variants of the quaternary amine 
tetramethylammonium bromide (TMAB) in which one of the 
methyl groups is replaced by either a 12 -carbon (DTAB) or 

25 a 16 -carbon (CTAB) alkyl group. TMAB is the bromide salt 
of the tetramethylammonium ion, a reagent used in nucleic 
acid renaturation experiments to decrease the G-C-content 
bias of the melting temperature. DTAB and CTAB are 
similar in structure to sodium dodecyl sulfate (SDS) , 

30 with the replacement of the negatively charged sulfate of 
SDS by a positively charged quaternary amine. While SDS 
is commonly used in hybridization buffers to reduce 
nonspecific binding and inhibit nucleases, it does not 
greatly affect the rate of renaturation. 
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When using a ligation process, the enzyme could be 
added with the labeled probes or after the proper washing 
step to reduce the background. 

5 Although not previously proposed for use in any SBH 

method, ligase technology is well established within the 
field of molecular biology. For example, Hood and 
colleagues described a ligase-mediated gene detection 
technique {Landegren et al . , 1988), the methodology of 

10 which can be readily adapted for use in Format 3 SBH. 
Landegren et al . describe an assay for the presence of 
given DNA sequences based on the ability of two 
oligonucleotides to anneal immediately adjacent to each 
other on a complementary target DNA molecule. The two 

15 oligonucleotides are then joined covalently by the action 
of a DNA ligase, provided that the nucleotides at the 
junction are correctly base-paired. Although not 
previously contemplated, this situation now arises in 
Format 3 sequencing. Wu & Wallace also describe the use 

20 of bacteriophage T4 DNA ligase to join two adjacent, 

short synthetic oligonucleotides. Their oligo ligation 
reactions were carried out in 50 mM Tris HC1 pH 7.6, 10 
mM MgCl 2 , 1 mM ATP, 1 mM DTT, and 5% PEG. Ligation 
reactions were heated to 100°C for 5-10 min followed by 

25 cooling to 0°C prior to the addition of T4 DNA ligase (1 
unit; Bethesda Research Laboratory) . Most ligation 
reactions were carried out at 30 °C and terminated by 
heating to 100°C for 5 min. 

30 Final washing appropriate for discriminating 

detection of hybridized adjacent, or ligated, 
oligonucleotides of length (F + P) , is then performed. 
This washing step is done in water for several minutes at 
40-60°C to wash out all the non ligated labeled probes, 

35 and all other compounds, to maximally reduce background. 
Because of the covalently bound labeled oligonucleotides, 
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detection is simplified (it does not have time and low 
temperature constrains) . 

Depending on the label used, imaging of the chips is 
done with different apparati . For radioactive labels, 
phosphor storage screen technology and Phosphorlmager as 
a scanner may be used (Molecular Dynamics, Sunnyvale, 
CA) . Chips are put in a cassette and covered by a 
phosphorous screen. After 1-4 hours of exposure, the 
screen is scanned and the image file stored at a computer 
hard disc. For the detection of fluorescent labels, CCD 
cameras and epif luorescent or confocal microscopy are 
used. For the chips generated directly on the pixels of 
a CCD camera, detection can be performed as described by 
Eggers et al . (1994, incorporated herein by reference) . 

Charge -coupled device (CCD) detectors serve as 
active solid supports that quantitatively detect and 
image the distribution of labeled target molecules in 
20 probe-based assays. These devices use the inherent 
characteristics of microelectronics that accommodate 
highly parallel assays, ultrasensitive detection, high 
throughput, integrated data acquisition and computation. 
Eggers et al . (1994) describe CCDs for use with probe- 
25 based assays, such as Format 3 SBH of the present 

invention, that allow quantitative assessment within 
seconds due to the high sensitivity and direct coupling 
employed. 

The integrated CCD detection approach enables the 
detection of molecular binding events on chips. The 
detector rapidly generates a two-dimensional pattern that 
uniquely characterizes the sample. In the specific 
operation of the CCD-based molecular detector, distinct 
biological probes are immobilized directly on the pixels 
of a CCD or can be attached to a disposable cover slip 
placed on the CCD surface. The sample molecules can be 



10 



30 



35 



WO 95/09248 



- 55 - 



PCT/US94/10945 



labeled with radioisotope, chemi luminescent or 
fluorescent tags. 

Upon exposure of the sample to the CCD-based probe 
5 array, photons or radioisotope decay products are emitted 
at the pixel locations where the sample has bound, in the 
case of Format 3, to two complementary probes. In turn, 
electron-hole pairs are generated in the silicon when the 
charged particles, or radiation from the labeled sample, 

10 are incident on the CCD gates. Electrons are then 

collected beneath adjacent CCD gates and sequentially 
read out on a display module. The number of 
photoelectrons generated at each pixel is directly 
proportional to the number of molecular binding events in 

15 such proximity. Consequently, molecular binding can be 
quantitatively determined (Eggers et al . , 1994). 

As recently reported, silicon-based CCDs have 
advantages as solid-state detection and imaging sensors 

20 primarily because of the high sensitivity of the devices 
over a wide wavelength range (from 1 to 10000 A) . 
Silicon is very responsive to electromagnetic radiation 
from the visible spectrum to soft X-rays. For visible 
light, a single photon incident on the CCD gate results 

25 in a single electron charge packet beneath the gate. A 
single soft X-ray beta particle (typically KeV to MeV 
range) generates thousands to tens of thousands of 
electrons. In addition to the high sensitivity, the CCDs 
described by Eggers et al . (1994) offer a wide dynamic 

30 range (4 to 5 orders of magnitude) since a detectable 

charge packet can range from a few to 10 5 electrons. The 
detection response is linear over a wide dynamic range- 
By placing the imaging array in proximity to the 

35 sample, the collection efficiency is improved by a factor 
of at least 10 over lens-based techniques such as those 
found in conventional CCD cameras. That is, the sample 
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(emitter) is in near contact with the detector (imaging 
array) , and this eliminates conventional imaging optics 
such as lenses and mirrors. 

5 When radioisotopes are attached as reporter groups 

to the target molecules, energetic particles are 
detected. Several reporter groups that emit particles of 
varying energies have been successfully utilized with the 
micro- fabricated detectors, including 32 P, 33 P # 35 S, 14 C 

10 and 125 L. The higher energy particles, such as from 32 P, 
provide the highest molecular detection sensitivity, 
whereas the lower energy particles, such as from 35 S, 
provide better resolution. Hence, the choice of the 
radioisotope reporter can be tailored as required. Once 

15 the particular radioisotope label is selected, the 

detection performance can be predicted by calculating the 
signal-to-noise ratio (SNR) , as described by Eggers 
et al. (1994) . 

20 An alternative luminescent detection procedure 

involves the use of fluorescent or chemiluminescent 
reporter groups attached to the target molecules. The 
fluorescent labels can be attached covalently or through 
interaction. Fluorescent dyes, such as ethidium bromide, 

25 with intense absorption bands in the near UV (300-350 nm) 
range and principal emission bands in the visible (500- 
650 nm) range, are most suited for the CCD devices 
employed since the quantum efficiency is several orders 
of magnitude lower at the excitation wavelength than at 

30 the fluorescent signal wavelength. 

From the perspective of detecting luminescence, the 
polysilicon CCD gates have the built-in capacity to 
filter away the contribution of incident light in the UV 
35 range, yet are very sensitive to the visible luminescence 
generated by the fluorescent reporter groups. Such 
inherently large discrimination against UV excitation 
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enables large SNRs (greater than 100) to be achieved by 
the CCDs as formulated in the incorporated paper by 
Eggers et al . (1994). 

5 For probe immobilization on the detector, 

hybridization matrices may be produced on inexpensive 
Si0 2 wafers, which are subsequently placed on the surface 
of the CCD following hybridization and drying. This 
format is economically efficient since the hybridization 
10 of the DNA is conducted on inexpensive disposable Si0 2 
wafers, thus allowing reuse of the more expensive CCD 
detector. Alternatively, the probes can be immobilized 
directly on the CCD to create a dedicated probe matrix. 

To immobilize probes upon the Si0 2 coating, a 
uniform epoxide layer is linked to the film surface, 
employing an epoxy-silane reagent and standard Si0 2 
modification chemistry. Amine-modif ied oligonucleotide 
probes are then linked to the Si0 2 surface by means of 
secondary amine formation with the epoxide ring. The 
resulting linkage provides 17 rotatable bonds of 
separation between the 3' base of the oligonucleotide and 
the Si0 2 surface. To ensure complete amine deprotonation 
and to minimize secondary structure formation during 
coupling, the reaction is performed in 0.1 M KOH and 
incubated at 37°C for 6 hours. 

In Format 3 SBH in general, signals are scored per 
each of billion points. It would not be necessary to 
30 hybridize all arrays, e.g., 4000 5 x 5mm, at a time and 
the successive use of smaller number of arrays is 
possible . 

Cycling hybridizations are one possible method for 
35 increasing the hybridization signal. In one cycle, most 
of the fixed probes will hybridize with DNA fragments 
with tail sequences non- complementary for labelled 
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probes. By increasing the temperature, those hybrids 
will be melted (FIG. 3) . In the next cycle, some of them 
(-0.1%) will hybridize with an appropriate DNA fragment 
and additional labeled probes will be ligated. In this 
5 case, there occurs a discriminative melting of DNA 
hybrids with mismatches for both probe sets 
simultaneously. 

In the cycle hybridization, all components are added 
10 before the cycling starts, at the 37°C for T4, or a 

higher temperature for a thermostable ligase. Then the 
temperature is decreased to 15-37°C and the chip is 
incubated for up to 10 minutes, and then the temperature 
is increased to 3 7°C or higher for a few minutes and then 
15 again reduced. Cycles can be repeated up to 10 times. 
In one variant, an optimal higher temperature (10-50°C) 
can be used without cycling and longer ligation reaction 
can be performed (1-3 hours) . 

20 The procedure described herein allows complex chip 

manufacturing using standard synthesis and precise 
spotting of oligonucleotides because a relatively small 
number of oligonucleotides are necessary. For example if 
all 7-mer oligos are synthesized (16384 probes), lists of 

25 256 million 14-mers can be determined. 

One important variant of the invented method is to 
use more than one differently labeled probe per basic 
array. This can be executed with two purposes in mind; 

30 multiplexing to reduce number of separately hybridized 
arrays; or to determine a list of even longer 
oligosequences such as 3 x 6 or 3 x 7 . In this case if 
two labels are used the specificity of the 3 consecutive 
oligonucleotides can be a~imost absolute because positive 

35 sites must have enough signals of both labels. 
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A further and additional variant is to use chips 
containing BxNy probes with y being from 1 to 4 . Those 
chips allow sequence reading in different frames.- This 
can also be achieved by using appropriate sets of labeled 
probes or both F and P probes could have some unspecified 
end positions (i.e., some element of terminal 
degeneracy) . Universal bases may also be employed as 
part of a linker to join the probes of defined sequence 
to the solid support. This makes the probe more 
available to hybridization and makes the construct more 
stable. If a probe has 5 bases, one may, e.g., use 3 
universal bases as a linker (FIG. 4) . 



15 EXAMPLE VIII 

ANALYZING THE DATA OBTAINED 

Image files are analyzed by an image analysis 
program, like DOTS program (Drmanac et al . , 1993), and 
20 scaled and evaluated by statistical functions included, 

e.g., in SCORES program (Drmanac et al., 1994). From the 
distribution of the signals an optimal threshold is 
determined for transforming signal into +/- output. 

25 From the position of the label detected, F + P 

nucleotide sequences from the fragments would be 
determined by combining the known sequences of the 
immobilized and labelled probes corresponding to the 
labelled positions. The complete nucleic acid sequence 

30 or sequence subfragments of the original molecule, such 
as a human chromosome, would then be assembled from the 
overlapping F + P sequences determined by computational 
deduction. 

35 One option is to transform hybridization signals 

e.g., scores, into +/- output during the sequence 
assembly process. In this case, assembly will start with 
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a F+P sequence with a very high score, for example F+P 
sequence AAAAAATTTTTT (SEQ ID NO:l). Scores of all four 
possible overlapping probes AAAAATTTTTTA (SEQ- ID NO: 3), 
AAAAATTTTTTT (SEQ ID NO: 4) , AAAAATTTTTTC (SEQ ID NO: 5) 
and AAAAATTTTTTG (SEQ ID NO: 6) and three additional 
probes that are different at the beginning (TAAAAATTTTTT, 
SEQ ID NO: 7; CAAAAATTTTTT , SEQ ID NO: 8; GAAAAATTTTTT , SEQ 
ID NO: 9) are compared and three outcomes defined: (i) 
only the starting probe and only one of the four 
overlapping probes have scores that are significantly 
positive relatively to the other six probes, in this case 
the AAAAAATTTTTT (SEQ ID NO:l) sequence will be extended 
for one nucleotides to the right; (ii) no one probe 
except the starting probe has a significantly positive 
score, assembly will stop, e.g., the AAAAAATTTTT (SEQ ID 
NO: 10) sequence is at the end of the DNA molecule that is 
sequenced; (iii) more than one significantly positive 
probe among the overlapped and/or other three probes is 
found; assembly is stopped because of the error or 
branching (Drmanac et al . , 1989). 

The processes of computational deduction would 
employ computer programs using existing algorithms (see, 
e.g., Pevzner, 1989; Drmanac et al . , 1991; Labat and 
25 Drmanac, 1993; each incorporated herein by reference) . 

If, in addition to F + P, F (space 1)P, F (space 2)P, 
F( space 3)P or F( space 4)P are determined, algorithms 
will be used to match all data sets to correct potential 
30 errors or to solve the situation where there is a 

branching problem (see, e.g., Drmanac et al . , 1989; Bains 
et al., 1988; each incorporated herein by reference). 
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EXAMPLE IX 
RE -USING SEQUENCING CHIPS 

When ligation is employed in the sequencing process, 
5 then the ordinary oligonucleotides chip cannot be 

immediately reused. The inventor contemplates that this 
may be overcome in various ways . 

One may employ ribonucleotides for the second probe, 
probe P, so that* this probe may subsequently be removed 
by RNAase treatment. RNAase treatment may utilize RNAase 
A an endoribonuclease that specifically attacks single- 
stranded RNA 3' to pyrimidine residues and cleaves the 
phosphate linkage to the adjacent nucleotide. The end 
products are pyrimidine 3 ' phosphates and 
oligonucleotides with terminal pyrimidine 3' phosphates. 
RNAase A works in the absence of cofactors and divalent 
cations . 

To utilize an RNAase, one would generally incubate 
the chip in any appropriate RNAase -containing buffer, as 
described by Sambrook et al . (1989; incorporated herein 
by reference) . The use of 3 0-50 \i\ of RNAase -containing 
buffer per 8 x 8mm or 9 x 9mm array at 37°C for between 
10 and 60 minutes is appropriate. One would then wash 
with hybridization buffer. 

Although not widely applicable, one could also use 
the uracil base, as described by Craig et al . (1989), 
30 incorporated herein by reference, in specific 
embodiments. Destruction of the ligated probe 
combination, to yield a re -usable chip, would be achieved 
by digestion with the E. coli repair enzyme, uraci-DNA 
glycosylase which removes uracil from DNA. 

35 

One could also generate a specifically cleavable 
bond between the probes and then cleave the bond after 
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detection. For example, this may achieved by chemical 
ligation as described by Shabarova et al . (1991) and 
Dolinnaya et al . (1988), both references being 
specifically incorporated herein by reference. 

5 

Shabarova et al . (1991) describe the condensation of 
oligodeoxyribo nucleotides with cyanogen bromide as a 
condensing agent. In their one step chemical ligation 
reaction, the oligonucleotides are heated to 97°C, slowly 
10 cooled to 0°C, then 1 pi 10M BrCN in acetonitrile is 
added . 

Dolinnaya et al . (1988) show how to incorporate 
phosphoramidate and pyrophosphate internucleotide bonds 

15 in DNA duplexes. They also use a chemical ligation 

method for modification of the sugar phosphate backbone 
of DNA, with a water-soluble carbodiimide (GDI) as a 
coupling agent. The selective cleavage of a phosphoamide 
bonds involves contact with 15% CH 3 COOH for 5 min at 

20 95 °C. The selective cleavage of a pyrophosphate bond 

involves contact with a pyridine -water mixture (9:1) and 
freshly distilled (CF 3 CO) 2 0. 



25 



While the compositions and methods of this invention 
have been described in terms of preferred embodiments, it 

30 will be apparent to those of skill in the art that 

variations may be applied to the composition, methods and 
in the steps or in the sequence of steps of the method 
described herein without departing from the concept, 
spirit and scope of the invention. More specifically, it 

35 will be apparent that certain agents that are both 

chemically and physiologically related may be substituted 
for the agents described herein while the same or similar 
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results would be achieved. All such similar substitutes 
and modifications apparent to those skilled in the art 
are deemed to be within the spirit, scope and concept of 
the invention as defined by the appended claims. All 
5 claimed matter and methods can be made and executed 
without undue experimentation. 
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CLAIMS 

1. A method for determining the sequence of a nucl 
5 acid molecule, comprising the steps of: 

(a) identifying sequences from the molecule by 
hybridizing the molecule to complementary 
sequences from two sets of small 
10 oligonucleotide probes of known sequence, 

wherein the first set of probes are attached to 
a solid support and the second set of probes 
are labelled probes in solution; 

15 (b) identifying overlapping stretches of sequence 

from the sequences identified in step (a) ; and 

(c) assembling the nucleic acid sequence of the 
molecule from said overlapping sequences 
20 identified. 



2. The method of claim 1, wherein said hybridization is 
carried out in cycles. 

25 

3 . A method for determining the sequence of a nucleic 
acid molecule, comprising the steps of: 

30 (a) fragmenting the nucleic acid molecule to be 

sequenced to provide intermediate length 
nucleic acid fragments; 

(b) identifying sequences from said fragments by 
35 hybridizing the fragments to complementary 

sequences from two sets of small 
oligonucleotide probes of known sequence, 
wherein the first set of probes are attached to 



WO 95/09248 



PCT/US94/10945 



- 72 - 

a solid support and the second set of probes 
are labelled probes in solution; 

(c) identifying overlapping stretches of sequence 

from said sequences identified in step (b) ; and 



10 



(d) assembling the nucleic acid sequence of the 
molecule from said overlapping sequences 
identified. 



4. The method of claim 3, wherein said fragments are 

sequentially hybridized to complementary sequences from 

two sets of small oligonucleotide probes of known 
15 sequence . 



5. The method of claim 3, wherein said fragments are 
simultaneously hybridized to complementary sequences from 
20 two sets of small oligonucleotide probes of known 
sequence . 



6. The method of claim 3, wherein said intermediate 
25 length nucleic acid fragments are between about 

10 nucleotides and about 4 0 nucleotides in length and 
said small oligonucleotide probes are between about 
4 nucleotides and about 9 nucleotides in length. 



7. The method of claim 3, wherein said oligonucleotide 
probes hybridize to completely complementary sequences 
from said fragments . 



35 
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8. The method of claim 3, wherein said oligonucleotide 
probes hybridize to immediately adjacent sequences from 
said fragments. 

9. The method of claim 8, wherein said oligonucleotide 
probes hybridize to completely complementary and 
immediately adjacent sequences from said fragments. 



10 10. The method of claim 8 # wherein said immediately 

adjacent oligonucleotide probes are subsequently ligated. 

11. The method of claim 3, wherein step (b) comprises 
15 the steps of: 

(a) contacting said first set of small attached 
oligonucleotide probes with said intermediate 
length nucleic acid fragments under 

20 hybridization conditions effective to allow 

only those fragments with a completely 
complementary sequence to hybridize to a probe, 
thereby forming primary complexes wherein the 
fragment has hybridized and free sequences; 

25 

(b) contacting said primary complexes with said 
second set of small labelled oligonucleotide 
probes under hybridization conditions effective 
to allow only those probes with completely 

30 complementary sequences to hybridize to a free 

fragment sequence, thereby forming secondary 
complexes wherein the fragment is hybridized to 
an attached probe and a labelled probe; 

35 ( C ) removing from said secondary complexes labelled 

probes that are not immediately adjacent to an 
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attached probe, thereby leaving only adjacent 
secondary complexes; 

detecting said adjacent secondary complexes by 
detecting the presence of the label; and 



(e) identifying sequences from the nucleic acid 

fragments in said adjacent secondary complexes 
by connecting the known sequences of the 
10 hybridized attached and labelled probes. 



12. A method of nucleic acid sequencing comprising the 
steps of: 

15 

(a) fragmenting the nucleic acid to be sequenced to 
provide nucleic acid fragments of length T; 

(b) preparing an array of immobilized 

20 oligonucleotide probes of known sequences and 

length F and a set of labelled oligonucleotide 
probes in solution of known sequences and 
length P, wherein F + P s T; 

25 (c) contacting said array of immobilized 

oligonucleotide probes with said nucleic acid 
fragments under hybridization conditions 
effective to allow the formation of primary 
complexes with hybridized, completely 

3 0 complementary sequences of length F and non- 

hybridized fragment sequences of length T - F; 

(d) contacting said complexes with said set of 
labelled oligonucleotide probes under 
3 5 hybridization conditions effective to allow 

only the formation of secondary complexes with 
hybridized, completely complementary sequences 
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of length F and immediately adjacent 
hybridized, completely complementary sequences 
of length P; - 

5 (e) detecting said secondary complexes by detecting 

the presence of the label; 

(f) identifying sequences of length F + P from the 
nucleic acid fragments in said secondary 

10 complexes by combining the known sequences of 

the hybridized immobilized and labelled probes; 

(g) determining stretches of said sequences of 
length F + P that overlap; and 



15 



(h) assembling the complete nucleic acid sequence 
from said overlapping sequences. 



20 13. The method of claim 12, wherein length T is about 
three times longer than length F. 



14. The method of claim 12, wherein length T is between 
25 about 10 nucleotides and about 40 nucleotides, length F 
is between about 4 nucleotides and about 9 nucleotides 
and length P is between about 4 nucleotides and about 
9 nucleotides. 



30 



15. The method of claim 14, wherein length T is about 
20 nucleotides, length F is about 6 nucleotides and 
length P is between about 6 nucleotides. 



35 
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16. The method of claim 12 , wherein said immediately 
adjacent immobilized and labeled oligonucleotide probes 
are ligated. 



5 

17. A method of nucleic acid sequencing comprising the 
steps of : 



(a) fragmenting the nucleic acid to be sequenced to 
10 provide intermediate length nucleic acid 

fragments ; 



(b) contacting an array of immobilized small 
oligonucleotide probes of known sequences with 

15 said nucleic acid fragments under hybridization 

conditions effective to allow only those 
fragments with a completely complementary 
sequence to hybridize to a probe, thereby 
forming primary complexes wherein the fragment 

20 has hybridized and non-hybridized sequences; 

(c) contacting said primary complexes with a set of 
labelled small oligonucleotide probes in 
solution of known sequences under hybridization 

25 conditions effective to allow only those probes 

with completely complementary sequences to 
hybridize to a non- hybridized fragment 
sequence, thereby forming secondary complexes 
wherein the fragment is hybridized to an 

30 immobilized probe and a labelled probe; 



(d) removing from said secondary complexes labelled 
probes that are not immediately adjacent to an 
immobilized prtfbe, thereby leaving only 
35 adjacent secondary complexes; 
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(e) detecting said adjacent secondary complexes by 
detecting the presence of the label; 

(f) identifying sequences from the nucleic acid 

5 fragments in said adjacent secondary complexes 

by combining the known sequences of the 
hybridized immobilized and labelled probes; 

(g) determining stretches of said sequences that 
10 overlap; and 

(h) assembling the complete nucleic acid sequence 
from said overlapping sequences identified. 



15 



18. The method of claim 17, wherein the nucleic acid is 
cloned DNA or chromosomal DNA. 



20 19. The method of claim 17, wherein the nucleic acid is 
mRNA. 



20. The method of claim 17, wherein the nucleic acid is 
25 fragmented by restriction enzyme digestion, ultrasound 
treatment, NaOH treatment or low pressure shearing. 



21. The method of claim 17, wherein the nucleic acid 
30 fragments are between about 10 nucleotides and about 100 
nucleotides in length. 



35 



22. The method of claim 17, wherein the oligonucleotide 
probes are between about 4 nucleotides and about 
9 nucleotides in length. 
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23. The method of claim 22, wherein the oligonucleotide 
probes are about 6 nucleotides in length. 

5 24. The method of claim 17, wherein said immobilized 

oligonucleotides are attached to a glass, polystyrene or 
teflon solid support. 

10 25. The method of claim 17, wherein said immobilized 
oligonucleotides are attached to a solid support via a 
phosphodiester linkage. 

15 26. The method of claim 17, wherein said immobilized 
oligonucleotides are attached to a solid support via a 
light -activated synthetic mechanism. 

20 27. The method of claim 17, wherein the labelled 
oligonucleotide probes are labelled with a non- 
radioactive isotope or a fluorescent dye. 

25 28. The method of claim 17, wherein the labelled 

oligonucleotide probes are labelled with 35 S, 32 P or 33 P. 

29. The method of claim 17, wherein said nucleic acid 

30 fragment or one of said oligonucleotide probes contains a 
modified base or a universal base. 

30. The method of claim 17, wherein labelled probes that 
35 are not immediately adjacent to an immobilized probe are 

removed from the secondary complexes by stringent washing 
conditions. 
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31. The method of claim 17, wherein labelled probes that 
are immediately adjacent to an immobilized probe are 
ligated to said immobilized probe and non-ligated . 
labelled probes are subsequently removed by washing. 

32. The method of claim 31, wherein said adjacent probes 
are ligated enzymatically . 



33. The method of claim 17, wherein multiple arrays of 
immobilized oligonucleotides are arranged in the form of 
a sequencing chip. 



15 

34 . A method of nucleic acid sequencing comprising the 
steps of : 

(a) fragmenting the nucleic acid to be sequenced to 
20 provide nucleic acid fragments of between about 

10 nucleotides and about 40 nucleotides in 
length; 

(b) contacting an array of immobilized 

25 oligonucleotide probes with known sequences of 

between about 4 nucleotides and about 
9 nucleotides in length with said nucleic acid 
fragments under hybridization conditions 
effective to allow only those fragments with a 

30 completely complementary sequence to hybridize 

to a probe, thereby forming primary complexes 
wherein the fragment has hybridized and non- 
hybridized sequences; 

35 (c) contacting said complexes with a set of 

32 P-labelled or 33 P-labelled oligonucleotide 
probes with known sequences of between about 
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4 nucleotides and about 9 nucleotides in length 
under hybridization conditions effective to 
allow only those labelled probes with 
completely complementary sequences to hybridize 
5 to a non-hybridized fragment sequence, thereby 

forming secondary complexes wherein the 
fragment is hybridized to an immobilized probe 
and a 32 P- labelled or 33 P- labelled probe; 

!0 (d) ligating the immobilized probes and labelled 

probes that are immediately adjacent with a DNA 
ligase enzyme, thereby forming ligated 
secondary complexes; 

15 ( e ) removing from the secondary complexes any non- 

ligated labelled probes; 

(f) detecting said ligated secondary complexes by 
detecting the presence of the 32 P or 33 P label; 

20 

(g) identifying sequences from the nucleic acid 
fragments in said ligated secondary complexes 
by combining the known sequences of the ligated 
probes ; 

25 

(h) determining stretches of said sequences that 
overlap; and 

(i) assembling the complete nucleic acid sequence 
30 from said overlapping sequences. 



35. A kit for use in nucleic acid sequencing, comprising 
a solid support chip having ' attached an arrangement of 
35 oligonucleotide probes of known sequences, said 
oligonucleotides being capable of taking part in 
hybridization reactions, and a set of containers 
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comprising solutions of labelled oligonucleotide probes 
of known sequences. 

36. The kit of claim 35, wherein multiple chips of 

5 immobilized oligonucleotide probes are arranged in the 
form of a sequencing array. 

37. The kit of claim 35, wherein the oligonucleotide 
10 probes are between about 4 nucleotides and about 

9 nucleotides in length. 

38. The kit of claim 37, wherein the oligonucleotide 
15 probes are about 6 nucleotides in length. 

39. The kit of claim 35, wherein the oligonucleotide 
probes are attached to a glass, polystyrene or teflon 

20 solid support. 

40. The kit of claim 35, wherein the oligonucleotide 
probes are attached to a solid support via a 

25 phosphodiester linkage. 

41. The kit of claim 35, wherein the oligonucleotide 
probes are attached to a solid support via a light - 

30 activated synthetic mechanism. 

42. The kit of claim 35, wherein the labelled 
oligonucleotide probes are labelled with a non- 
35 radioactive isotope or a fluorescent dye. 
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43. The kit of claim 35, wherein one of the 
oligonucleotide probes contains a modified or a universal 
base . 

5 

44. The kit of claim 35, wherein the labelled 
oligonucleotide probes are labelled with 35 S, 32 P or 33 P. 

10 45. The kit of claim 35, further comprising a ligating 
agent . 

46. The kit of claim 45, wherein the ligating agent is a 
15 DNA ligase enzyme. 
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